Seeeeer 
eRe eH eS ere ae 
Or 


th eitbe ee meee 
tee tee ene ee 


eee eine 
Thehdhret 377) Cee ER PPE R ee recat 
Cee eee Hee ka ge bee ee ae 
Cobre Oe sete eet ee fer 
vere het AN ates 
+ are beeeve Ceres eerie. 
‘ Eee ret irareiore ecerereverere e bisteieiiseleres 
BROOUCLE RL UULOE TOUR Cer Gt +h BR yht hel rigs tset 
Pied aeth nce eieceiat Seb eee cece fate Veee bese 
Sty sek eb ee hee Pre S7ee) athlete brecelesen 
eer eg heue witeee ee ee bre Sietatererereceia oleper@ceceres ecb O18t 
eo eek ee ore tre (ot x ated te beteren 


fees 
ote ee ee 
oop ere Tt whet 
nie ee ee Oe & 

Chee 


preiipaeveeeet 

BAO TOO ULI Blt 
ener ee eee bade’ 

ORR a Cn Sa 


te. 


fee 
+e 


vert er 
CTCL ATS Por ee ae ee ten eevee 
ore @ e114, MF) eerste waernpe se ere nie ei enee eae eit 
Pert yt ‘ eet tae 
Wee's cued Fee aie ee ve Par eat tt Sate Sk oe a 
RAT ote ere apeiete * eee 
’ 


rare eeeeee 
Peteret tse teetyecers 
of ’ eerreet eevee. 


se eenee 
te 


Tee) Ot 
bees 
‘ 
bees 
Pere hehe eee eee, 
UCT aOR ELH Rt 


bee aee 
eee eee eit 
ewe 


viele ye « . 
Vere ee elaetetg ere’ 


Pietelcateubitenesaatine 
Daa ite  oiat ir st bth Ht 
OEE et OEE Oey ECE eae 
Par err beer it er nt ot Ce ee 
erejeepe tee je ee eee ee ee ee ERE eee « 
woe bee eb es eC eee etree ties SOO HS eee RES ete 
OTHE RPE Meee teeter eee See CE aH were ee we 
hier heh tar . eererttyeoetver 


ve eee 


POET ETON week elie 
VOM ee cee ed Cae ieee 


t 
peerrene 
Sew ares 


eee aes 


a eis eredpe hee 
reese ohare age tts 
Pe 

eile eee 
ewe yee 


ae 
peasy 
cn ee eee 
ee ereeer, 


wees ‘ 
tise ue 
ep siew urbe ee 
Shrevara 6 SraraseiUle wyele 


Oe wa lsieie rele ese 
citer s gi ett ores 


Pee RR 
ped eeda dias 
Seuss eae apie 

Sele stares 


Pee i 
ue septa ae ieredegey 
ee sitet ees (wien 


b tte dele eras ain be trae 
ere leet BPR FOP OE tt 
are ere se eee be 8 Oe 


eevee eae 
teaaeet 
. * 


bidjere wr ee eer 
$4.8 ot Os ere Pvt 


. 
we 
Ou 


vee 
eee sera 
seneaunt 


eve esas 
se wlesiog 
pee ecient 


eee eres 
re neee han 


wperepeseie Bee Cee 
peeve 


(ee tie 

‘eh eoree 

OR 
« 


. 


OX ee eas 
Preey wees 
Se a 


ties 
gitis eonee 


eee 

vient 

atiteges 
eaters 


ue ve 

Berersaetreergue 

. Pewee wee ee 

Vie eeeeet eet are ges ee) 
On) 


averejepeyere 
eS eee 


ara eTete 

eae 
bh Tether tt | 
were eeee 


vel uelge 


one 
ebay 


vee 


beatae 


«ae rerar tr rae 
oh ee ee res 
oie ee ts belaes 


Cw 
wee erees 


terre 


Peete eer 

wee tient © 
a eee ee ee rse 
Pr eveeeinn tare 
vee see . 


. eee 
sees rit ee 


ear ee hee 


Hee 
Coser ere eter 
Orb; ere eese ered 
Shes ase eeegee 
diets one . 


per Ce a) 
eh ee ae at + © Oe a eres 
AOS a ant vee 
eee eeeae 
ache Seer 


eee 
eee eeieae 
nie tos 


“peers 
terse 6 © 


vaeeee 


ee eeaeseee 
Ce NO eee 
Fe eee eden 


taae 
ee eRe 
Serene ef . 
eek eene ce us 
yeet arene 


‘areie 


. ei eteee eee 
teaes see eevav ae 
rapeeeeee 


wareaeeas 
ee ee 
yeees 
sete 


eee weet 
ee beens 
eter eos 
se eeeart 


‘ vereees 

ere eee ee ee 
. bees 

‘ ‘ 

seaee eect ieenet 

* i Perera So i a * 

' Pie et ete ee ten ¢ 


Co ere ere ee 
rhe es nelle wie 


, 


BooKs FROM 
BELL TELEPHONE LABORATORIES 


THEORY OF VIBRATING SYSTEMS 
AND SOUND 

By Irvine B. Cranpatt, Late Member of 

the Technical Staff, Bell Telephone Labora- 

tories. Illustrated, 282 pages, 63 X 94. 

Full Buckram. Second Printing. $5.00 


CONTEMPORARY PHYSICS 
By Kart K. Darrow, Member of the 
Technical Staff, Bell Telephone Labora- 
‘tories. Illustrated, 512 pages, 63 X 9%. 
Full Buckram. Second Printing. $6.00 


PROBABILITY AND ITS ENGINEERING 


USES 
By TuHornton C. Fry, Member of the 
Technical Staff, Bell T elephone Labora- 
tories, Illustrated, 500 pages, 63 X 94. 
Full Buckram. 


TRANSMISSION CIRCUITS FOR 
TELEPHONIC COMMUNICATION. 
METHODS OF ANALYSIS AND 
Wa 

By S. Jounson, Member of the Tech- 
pice Staff, Bell ideas Laboratories. 
Diagrams, 334 pages, 6 X 93. Buckram. 
Third Printing. $5.00 


PUBLISHED BY 
D. Van Nostranp Company, Inc. 


Probability 


and its Engineering Uses 


By 


PHORNPEONSE. “PE RYe Pu'D. 
Member of the Technical Staff 


Bett TELEPHONE Lasorarories, Ine, 


NEW YORK 
D. VAN, NOSTRAND COMPANY, Inc. 
EIGHT WARREN STREET 


1928 


<r 
“ed <a = N ~ . 
: Copyricur 1928, a 
BY : (ins, : q 
__D. VAN NOSTRAND COMPANY, Inc. _ ‘o 
All rights reserved, including. that of translation ; s 
into the Scandinavian and other foreign languages ’ 


i 
“ 


Fe “Coincidences, in general, are great stumbling- — 
blocks in the way of that class of thinkers who have $ 
a bee been educated to know nothing of the theory of 
- probabilities: that theory to which the most glorious 
Bee of human research are indebted for the 
most glorious of illustration.” 
—The Murders in the Rue Morgue. 


ACKNOWLEDGMENTS 


Tuis textbook is the outgrowth of a set of notes originally 
prepared for one of the “‘ Out-of-Hour Courses ” of Bell Tele- 
phone Laboratories, and subsequently revised for use in a 
course of lectures delivered at the Massachusetts Institute of 
Technology in its Department of Electrical Engineering. | 
am deeply indebted to both these institutions for arrangements 
whereby it was possible to try out the material under actual 
classroom conditions—a rare opportunity for a mathema- 
tician engaged in industrial work. 

I am also indebted in various ways to a number of 
individuals: 

To Dr. W. A. Shewhart and Mr. L. A. MacColl, of Bell 
Telephone Laboratories; to Mr. E. C. Molina and Dr. H. 
Nyquist, of the Americah Telephone and Telegraph Company; 
and to Mr. Arne Fisher, of the Western Union Telegraph 
Company, for valuable advice and criticism. 

To Mr. R. A. Fisher, of the Rothamsted Experimental 
Station, and to his publishers, Messrs. Oliver and Boyd, of 
Edinburgh, for permission to use the tables contained in 
Appendices VIII and IX, which are copied, except for slight 
changes in notation, from his excellent monograph on Statistical 
Methods for Research Workers. 

To Miss Clara L. Froelich, of Bell Telephone Laboratories, 
for assistance without which the book could not well have been 
written. Not only are the figures almost entirely the work 
of her hands, but she has assisted materially in preparing the 
copy for the printer, among other things checking the entire 
manuscript for mathematical errors. It cannot be hoped that 
no errors remain, but her efforts have greatly reduced their 


number. 
THornton C,. Fry. 


New York, 
February 15, 1928. 


CONTENTS 


CHAPTER I 
: INTRODUCTION 
SECT. PAGE 
1. The Fundamental Subject Matter of Probability I 
2. How Probability is Measured; The Unit of Measure 3 
3. How Probability is Measured; The Fundamental Axioms A Cafeenuons 4 
4. How Probability is Measured; The Measure of Probability Defined 7 
5. How Probability is Measured; Shortcomings of the Definition 8 
6. Final Remarks a are 9 
CHAPTER II 
PERMUTATIONS AND COMBINATIONS 

me General Waws of Composition'of Eyents mem. 4. a ee sD 
8. Definitions . . 14 

g. Application of the General Cave = Crepe to Periatations cad 
Combinations; Some Typical Examples . . 16 

10. Application of the General Laws of Composition to Peroutaticnss Goaceal 
Theorems . . Cy See ee ay fs 
11. Factorials; The Guiting. enon oe Ce tal 5 Pr ae aee 
12. Restatement of General Theorems in Bettamations 2A a pated eek 
13. Application of the General Laws to Combinations. . . . . . . . 25 
14. Some General Properties of Cy’; Pascal’s Triangle... . . . . .) . (27 
15. Some General Properties of Cr’; The Binomial Theorem . . . . . . 29 
16. The Solution of More Complicated Problems . . . . . . . «© « 32 
77.,A Complicated Problem in Permutations . . . . » «© «*. « » 33 

CHAPTER III 

ELEMENTARY PRINCIPLES OF THE THEORY OF PROBABILITY 
fe. Complementary Probabilities . . 2 - s+ wT we 2 ets s+ 39 
eiig, Miconcitional Probabilities .. . s «© 6 © 2 © © se © + w 39 
aomc@onditional Probabilities. 2 5. © «9 © «© « =» «© «6 W © w « 43 
ar. Compound Probabilities =. 2 |. ee 6 Fe ee es 48 


x CONTENTS 

SECT. PAGE 
22. Alternative Compound Probabilities . 53 
23. Some Instructive Illustrations; The Psychic Research Problen 57 

24. Some Instructive Illustrations; A Generalization of the Psychic Research 
Problem ; : 58 
25. Some Instructive Wlinserations: The Preble Pa independent Hals 62 

26. Some Instructive Illustrations; A Generalization of the Problem of Inde- 
pendent Trials 64 
27. Some Instructive luseeeonee A Typical Uke Problem 65 
28. Some Instructive Illustrations; Another Typical Urn Problem 69 
29. Some Instructive Illustrations; A Problem in Matching . 70 
30. Some Instructive Illustrations; An Example in Computation . 73 
31. Some Instructive Illustrations; Another Urn Problem iF 
32. Some Instructive Illustrations; Another Urn Problem 78 

CHAPTER: Jv 

PRoBABILITY AND EXPERIMENT; BERNOULLI’S THEOREM , 
33. Introductory Remarks . 82 
34. Limits and Things which Cepe cers 83 
35. The Upper Bound of a Set aoe 86 
36. Regarding Probability as a Limit . : 88 
37. Regarding Repeated Independent Trials gi 
38. The Limiting Condition as the Number of Trials is Greatly jeceted 93 
39- Bernoulli’s Theorem 97 
40. Résumé F IOI 
41. Mathematical Jusdieationy 102 
42. Stirling’s Formula 103 
43- Another Approximate Formula : 107 
44. Justification of the First Half of Bernoulli s Theorent z 108 
45. Justification of the Second Half of Bernoulli’s Theorem IIo 
46. Regarding the Experimental Determination of Probability 112 
47. The Multiplication Theorem 113 

CHAPTER V 
PROBABILITY AND ExpERIMENT; Bayes’ THEOREM 

48. The “ Life on Mars ’”’ Paradox 117 
49. Bayes’ Theorem . Pog Se . 11g 
50. Some Instructive Tiluserations;> Bererand’ s ‘Ber Paradox” . 121 
51. Some Instructive Illustrations; Another Urn Problem 23 
52. Some Instructive Illustrations; The Bad Penny = 195 
53- The Uses to which Bayes’ Theorem may be Put ploy 
54. Some Instructive Illustrations; An Elementary Problem in senpler . 129 


CONTENTS xi 


CHAPTER VI : 
Distripution Functions anp Continuous VARIABLES 
SECT. PAGE 
55. The Random Choice of a Point on a Line Segment 133 


- A Paradox Associated with the Random Choice of a Point ona ine ae 138 
. Extension of the Significance of the Preceding Paragraphs . . . . . 140 


58. Distribution Functions for Continuous Variables » igi 
59. A Variable Which is Not Distributed at Random 5 ve: 
60. Distribution Functions Derived Empirically sale BGs tore te ed 44 
61. Distribution Functions in Many Variables . . . ance pasta 
62. Some Examples of Change of Variable in Distribution Bilnctione va oeeneT SO 
63. Change of Variable in Distribution Functions . . 153 
64. Derivation of a Distribution Function for the Welectdes of Gas. Moletules 163 
65. Some Instructive Illustrations; Change of Variable in Maxwell’s Equation 165 
66. Information that can be Derived from (73) and(75) . . . . . . . 167 
67. Some Instructive Illustrations; A More Complicated Jacobian . . . . 191 
Gay Phe General Significance-of the Jacobian. 3.) . 2. sw ee) 5 193 


E83: 


CHAPTER VII 


AVERAGES 

teem Ditto LAM VAVCTAE foe go sel, Sele ve ee es ly EF 
70. Mathematical Expectation . fa Oe Ss Roget IG eo 2 
71. Derived Averages and Pevec eons eae ing | 62 a Bemeeersl, ua tee Oe: 
a7 Averave Values of Continuous Variables or". 5 » «© = « «© » « 389 
See ecrrAnmmg ot See a Me Suse Wale a ot oe oa) BOF 
Ren R ie WIC TIT E Bai ew, ay ee Ges oh ee ga B88 
“6. Résumé . . . 1g0 
76. Some Instructive situs nom The General Case 53 independiente pals ge 

77. Some Instructive Illustrations; The General Case of Dependent Trials of 
the Sort Discussed in §27 . . odin eye aoe ahs death O3 
78. Some Instructive Illustrations; A Dice Problem PE Me ae iw ROT 
79. Some Instructive Illustrations; The St. Petersburg Paradox. . . . « 194 
80: The Expectation ofa Probability. . . . . . « «© «© + «© « + 199 

CHAPTER VIII 
Tue Distrisution Functions Most FREQUENTLY JSED 
IN ENGINEERING 

81. Introductory. . . 205 

82. Distribution Pametons ee Diverete: Variables: The Finemial Gav ‘aid 
Various Approximations toIlt . . . . 206 

Distribution Functions for Discrete Variables: The Poisson Lis as a 


Limiting Case of the Binomialviaaw emery sn ode ane a che a es 2 EA 


Xii 


SECT. 
. Definitions of the Phrases “Individually at Random 


IOI. 
102. 
. The Solution of Examples 50 and 51 

. Résumé of the Test of Goodness of Fit 
105. 
~ 106. 
107. 


108. 
109. 
110. 
TIT. 


112, 
Ti 3s 
114. 


. The Primary Problem SEC oS as, 

. The Hypothetical “ Universe of Dats’ fOr P Populananl: and the “ Sample 
. The Accepted Criterion as to Goodness of Fit 3 etonuors é 

. Some Instructive Illustrations; ‘The Biased Die 

. Discussion of Example 49 . . . 

. Some Instructive Illustrations; Weldon’ s ‘Die Data 


CONTENTS 


PAGE 


” 


and “ Collectively 
at Random ” 


. Second Demonstration of [Ks Passes et ; 
. Discussion of the Poisson Law; Problems to Which Te is an mn Appropriate 


Solution 


Exchange . 


. Discussion of the Peon Laws The General Problem sf Baay Eiission 
. An Approximation to the Poisson Law 

. The Normal Law ; 

. Empirical Families of Curves; Peanson’ s Garves ; 

. Empirical Families of Curves; The Gram—Charlier Series . ; 
. Gram-Charlier Approximations to the Binomial and Poisson Laws . 
. Empirical Families of Curves; Transformation of Variable 


CHAPTER IX 
Curve Fitting. 


” 


An Approximation to the General Multinomial Law 
The Measure of the Goodness of Fit, P( > x?) 


Some Instructive Illustrations; Some Telephone Date ‘ 
Some Instructive Illustrations; Chips Drawn from a Normal Univers 3 


Formula is Known 
Some Instructive Tivemasbae: A Recoasdetacion of Weldon’ s Dike Data 
Sheppard’s Corrections to idnens Computed from Classified Data 
The Distiipution’ot;Statisticsy sa ee ee ae nee 
Control Charts 


CHAPTER X 


Tue THeEorY or PROBABILITY AS APPLIED TO PROBLEMS 
or CONGESTION 
Introductory . 


Notation . 
General aageetpeons 


. Discussion of the Polstor Tass Vertalble Trafbe ee in a “Pelephigue 
292 


235 


. 237 
- 241 
- 244 
5 BL 


5 DEK 
. 261 


» 205 


266 


. 268 
ok 
5 eats 
. iy 


. 280 
“985 
. 289 
p 201 
- 294 
. 296 
The Determination of a Suitable Distribution Function when no Theoretical 

- 297 


302 


- 310 
a912 
os See 


CONTENTS xiii 


SECT. 


PAGE 
115. Some Problems of Lost Traffic . . ti Berea 2326 
116. The Elementary Probabilities; Lost Calls Held ee 329 
117. Introduction of the Assampoon of Statistical Equilibrium; Last Cals Held 331 
118. The Probability Formula Corresponding to Assumptions 7and 10... 334 
11g. The Probability Formula Corresponding to Assumptions 8andio . . . 336 
120. The Probability Formule Corresponding to Assumptions g and1o . .. 338 
121. The Elementary Probabilities; Lost Calls Cleared it ere 162339 
122. The Probability Formulae Coeqendine to Assumptions 7 and Ir . . . 340 
123. The Probability Formula Corresponding to Assumptions 8 and1r . .  . 342 
124. The Probability Formule Corresponding to Assumptions g and 11... 343 
125. Recapitulation of Formule . . 344 


126. Numerical Comparison of Formulz; 7: ied Deveadades ‘a the Prebabiliey RS 
Loss upon the Number of os when the Traffic Density of the Group 
is Held Constant . . 346 
127. Numerical Comparison of Formule; T he Dependence ae the: Allowatle 
Traffic Density upon the Number of Sources when the Proportion of Loss 


iSBLIXCC. wh @ eye Be ge he Oh | eR Mee?) 
128. Charts for Purposes of Computadon eae se, eee ee eke ae QED 
129. Some Hunting Problems . . fe ames Be So 2 Fe, ln a 50 
130. Individual Hunting from a Nocaal Posidon! Gy cote aera Rigs SNES T 
131. Individual Hunting with Stay-Put Switches . . . . 2 . . . . 363 
192-3Gtoup Eluntine with-Stay-but Switches. . . . «». . . « « . « 304 
Bey ine Provlemof Double Connections. .° 2 1 1. we sw ee 0 360 
134. Delaysin Awaiting Service. . . roy D 
135. Calls of Equal Length at Non- Gooperatiee! Chassek, The Pisbabilicy oF 

Congestionyg . . Sia 


136. Calls of Equal Length at a Nien Sop ersine owes ‘The Bthecie Delay 376 

137. Exponential Distribution of Holding Times; Delays at Cooperative Groups 
ofChannels . . : 378 

138. Exponential Disesiation a Hoties ae The Brobabiliey of Coens 380 


- 139. Exponential Distribution of Holding Times; The Expected Delay . . 382 
140. Exponential Distribution of Holding Times; The Probability of a Tey 


Exceeding the Length 7 if Calls are Served in the Order i in which They 

Originate . . Mn B38) 
141. Exponential Biceibanion GHiolding Times, The Proporuon A Delayed ae 385 
142. Exponential Distribution of tg ees The Seti ree of Se 

Calls aes ceaa tt Go ten : : . 386 


CHAPTER XI 


FLucruaTION PHENOMENA IN Puysics 


~ 143. Introductory Remarks; geo Mba ae nate ce otk Wau dee) oe ee O09 
144. The Dynamics of Collision. . te he ie SN ah AS Fee gee 1°15) 
145. The Probable Flux Across a URS 5 Lab Set), eek Sauget IC gy Mea IO) 


146. Pareeo,Veocty Class 2 ove 5) 8 5 oe a HR 39H, 


CONTENTS 


. The Fundamental Equation of the Kinetic Theory of Gases . 
. The 4-Function 

. Maxwell’s Law of Velotines 

. Pressure . : 

. The Expectation af ihe ieee Deavelica i a Molecule é 


152. Number of Collisions : 
153 Expected (“ Mean ”’) Free Path 
154. Density Fluctuations ete Re) is eae er 
155. The Rapidity of Density Flecantiore PMR) gat cp ge 
156. The: Schottky Effect: 0c. ale pee ee 
APPENDICES 
I, The Factorials of Integers 
IJ. The Logarithms of Factorials 
III. The Binomial Coefficients C7” 
IV. The Normal Error Function . 
V. The Normal Law, Its Integral, and Its Deter up to the Sixth 
VI. The Poisson Formula, “ P’(j) 
VII. The Poisson Formula, “ IT’, 
VIII. Pearson’s Criterion of Goodness of Fit, P(> a 
IX. Standard Deviations of Important Statistics . . . . . . 
X. Criteria for Choice of Distribution Curves . : 
XI. Statistics of the More Common Distribution Laws . . . 


InDEXx 


PAGE 
- 399 
. 400 
. 404 
. 406 
- 408 
. 409 
- 409 
- 410 
. 414 
alge 


. 427 
. 429 
- 439 
- 453 
- 456 
. 458° 
- 463 
- 468 
. 470 
s 470 
eee 


- 473 


CHAPTER I 


INTRODUCTION 


$1. The Fundamental Subject Matter of Probability 


It is the fundamental purpose of the theory of probability 
to answer such questions as: What is the probability of tossing 
an ace with a die? What is the probability that Christmas 
falls on Monday? What is the probability that the next 
child born in New York is a girl? What is the probability 
that Friday falls on Sunday? What is the probability that 
twenty sheets of paper in a package of 500 differ from the 
average by more than I per cent in thickness? 

The subject deals with other questions—about “‘ expecta- 
tion,” “‘ correlation ”’ and the like—but they are all subordinate 
to the question, What is the probability of a certain phe- 
nomenon? Whatever the subject matter, the phenomenon of 
which the probability is sought is called an “ event.” 

Asking for the probability of an event in itself implies some 
degree of doubt as to its occurrence; that is, it implies the 
possibility that the event may not occur. Of course, there are 
certain causative or controlling factors which determine 
whether or not the event will occur. Divine intervention is 
not anticipated; and with sufficient information the answer 
to the question would be either “It is certain to occur” or 
“Tt is certain not to occur.” 

Thus, in the case of Friday falling on Sunday, the answer 
is “It is certain not to occur,” for we know that the thing 
cannot occur. Moreover, if the question, What is the prob- 
ability that Christmas falls on Monday? were asked about 
Christmas of this year, it would be possible to look it up in a 
calendar and find out on what day it actually falls. As it 

gi 


2 PROBABILITY AND ITS ENGINEERING USES 


either does or does not fall on Monday, the answer would then 
be a statement of fact, not of likelihood. Unless a certain 
amount of ignorance exists such questions are trivial. 

But asking for the probability of an event implies more 
than mere ignorance. ‘It also implies that ignorance is some- 
times of less consequence than other times. Take, for example, 
the questions, What is the probability that the next child 
born in New York is a girl? and What is the chance that 
the next ten children born in New York-are all girls? The 
fact is not known in either case; but no doubt exists that 
ignorance is more serious in the first case than in the second. 
One event is less in doubt than the other. From this point of 
view the probability of an event evaluates the importance of 
our state of ignorance regarding it. 

This illustration reveals two phases of the intuitive concept 
of probability. One is that either event may occur: that is, 
the next child may be a girl, or the next ten children may be 
girls. This phase is purely qualitative. The other is that the 
first event is decidedly more probable than the second. This 
phase is quantitative: some probabilities exceed others. 

Consider also the matter of Christmas falling on Monday. 
There are seven days in the week and it is a matter of common 
knowledge that there is nothing in the arrangement of the 
calendar which tends to favor one of these days rather than 
another. This thought finds expression in the phrase: Christ- 
mas is “just as likely” to fall on Monday as on any of the 
other days. Moreover, we find it natural to say that Christmas 
is twice as likely to fall on either Monday or Tuesday as it 
is on Monday alone; and that it is three times as likely to fall 
on Sunday, Monday, or Tuesday, as on Monday alone. 

This illustration reveals two more intuitive ideas associated 
with the concept of probability. One is the idea of “ equally 
likely.” “The other is the idea that, under certain circum- 
stances at least, the probability of ove or the other of several 
events is the sum of their separate probabilities. 


§ 2. THE UNIT OF MEASURE 2 


§ 2. How Probability is Measured; The Unit of Measure 


These ideas are of a purely intuitive nature. They are 
merely an appraisal of that common understanding of the word 
“ probability ” which makes it an element of speech. Define 
it we cannot, any more than we can define “length” or 
“time” or “value” or other quantitative concepts; but we 
can define a method of measuring it, just as in the case of 
“length” or “time” or “value.” And just as we are 
accustomed in speaking of “length” to substitute the number 
for the fact, so, too, we shall generally, after we have passed 
on to the mathematical phases of our discussion, use the word 
“probability ” for what we should, if we were exact, speak 
of as the “‘ measure of probability.”” For the present, how- 
ever, we maintain the distinction, and, admitting our inability 
to define probability, seek for a method of measuring it. 

Such a method flows naturally from the ideas already 
presented. Using again the Christmas illustration, the numer- 
ical measure of the probability of Christmas falling on Monday 
_may be denoted by p, The value of p is unknown, but cer- 

tain relations into which it enters have already been stated. 
For instance, the chance of Christmas falling on Tuesday is 
also p, and the chance of it falling on any other day is the same. 
Moreover, we have said that it appears natural to say that the 
chance of it falling on one or the other of several days is the 
sum of the probabilities for the separate days; the probability 
of it falling on one or the other of the seven days of the week 
is therefore 7p. But it is certain that Christmas falls on 
some day of the week: therefore 7p must be the number which 
represents certainty. 

What number shall be chosen for this purpose is purely 
optional, although some numbers may seem more appropriate 
than others. For example, infinity might seem peculiarly 
suitable, because it seems natural to say that the event is 
“infinitely likely to occur.” But if infinity is chosen, the 
equation 


ieee 


4 PROBABILITY AND ITS ENGINEERING USES 


is obtained; and this requires that p also be infinite. Thus the 
choice leads to the logical absurdity that the chance of Christ- 
mas falling on Monday is represented by the same symbol 
as certainty, though it does not accord with the idea of cer- 
tainty. 

Unity is the only other number which recommends itself 
to denote certainty. It leads to the equation 


Tage 


from which it is found that p=4. This value does not 
violate intuitive ideas, and is therefore more satisfactory than 
the other. Thus it has become customary to adopt unity 
as a symbol for certainty. As a consequence of this choice 
all probabilities are confined to the range of proper fractions, 
including the end points 0 and 1 which represent impossibility } 
and certainty,' respectively. 


§ 3. How Probability 1s Measured; The Fundamental Axioms 
and Conventions 


The above illustration contains, by implication at least, 
the essential ideas needed for a general definition of the measure 
of probability. But before proceeding to such a general 
definition, it is desirable to sort out and restate, as best we can, 
the intuitive ideas (or axioms) upon which the illustration was 


based. They are: 


Axiom I.—The question, What is the probability that 
the event A occurs, has an answer. 

Axiom II.—This answer is quantitative; that is, it can 
be stated in terms of a unit of measure and a ratio (a pure 
number). 

Axiom III.—Jf two events differ in no other known 
pertinent attribute than identity, they are equally likely. 

Convention I.—The unit of measure is certainty. 


1In § 56, certain remarks are made regarding the use of these words. 


Mee FUNDAMENTAL AXIOMS 5 


Convention II.—The scale of measure is to be so chosen 
that the probability that either A or B happens is the sum 
of their separate probabilities, so long as the events A and B 
are mutually exclusive; that is, so long as it is impossible for 
both of them to happen. 


The third of these axioms requires some discussion. Since 
the cqnception of “equally likely” events is intuitive, it 
cannot be defined, just as other intuitive concepts such as 
Time, or Sweet, or I, cannot be defined. It is possible, how- 
ever, by intelligent consideration to give them a greater 
depth of meaning. Put it this way. Defining an expression 
enables one to learn what it means. We cannot do this with 
intuitive ideas; but intelligent discussion may enable us to 
appreciate more fully what we mean by them. Thus Axiom III 
can in no way be called a definition of “ equally likely,” but 
it 1s consistent with the idea which that expression conveys 
and may even be an aid in checking doubtful cases. 

If the Christmas illustration is viewed in the light of this 
statement, there are seven possible events: Christmas may 
fall on any one of the seven days of the week. These events 
differ in identity. Otherwise they could not be thought of as 
distinct events at all. The days themselves differ in other 
known attributes: Sunday is a day when people go to church, 
Monday is wash-day, Election Day falls on Tuesday, and 
Saturday is (or once was) pay-day. Of necessity the events 
themselves partake to some extent of these attributes; for 
instance, ‘“‘ Christmas falls on Monday” partakes of the 
attribute of the day and may also be phrased “ Christmas 
falls on wash-day.” These attributes, however, are not 
pertinent to the question at hand: our state of ignorance 
would be just as important—and no more so—if they were 
unknown, or even untrue. If habits changed and Thursday 
become the conventional wash-day, Christmas would still be 
just as likely to fall on Monday as before and no more So. 

There may be other attributes to these days which are 
pertinent, but which are unknown. For instance, he caits 


6 PROBABILITY AND ITS ENGINEERING USES 


question is directed at Christmas of this year, one of the days 
of the week possesses the attribute of being “‘ the day of the 
week on which Christmas does fall.’ This is obviously an 
essential attribute; but so long as it is unknown the probability 
that Christmas falls on Monday is unaffected. 

It may be argued that Axiom III proceeds in a circle because 
“ pertinent attributes ” means merely those which influence the 
likelihood of the event. This is true, and it would be a valid 
objection to a definition; but it must be remembered that III 
is not a definition: it is merely an attempt to state in other 
words the intuitive meaning of the phrase “‘ equally likely.” 

And finally, a word of caution should be said about confusing 
the absence ofany pertinent difference between two events, with 
lack of sufficient knowledge to evaluate the importance of the 
difference. I ask, If I receive just one telegram to-day, what 
is the chance that it will be between 1 and 2 o’clock in the 
afternoon? There are obviously 24 hours in the day, and I 
do not know what the probability is for any one of them. 
Shall I therefore assert that they are equally likely? Obvi- 
ously not; there are vastly more people awake during the hour 
in question than between 1 and 2 in the morning, for example, 
and this certainly is a pertinent difference so far as the likeli- 
hood of a telegram being received is concerned. In a later 
section (§ 48) we shall again refer to this point, which is the 
fundamental error in a widely quoted paradox.! 


1 There are two schools of thought, calling themselves “insufficient reasonists ” and 
“cogent reasonists,” both of whom accept Probability as an a priori fact, but whose 
ways part on the “definition” of the term “equally likely.” The insufficient reasonists 
say, as we have done (though we do not call it a definition), that two events are equally 
likely if there is no reason to think them otherwise. The cogent reasonists say they 
are equally likely if there is a cogent reason for thinking them so. 

In so far as one who regards the concept of “equally likely” as intuitive can be 
said to belong to either school, I am an insufficient reasonist: tome the most “ cogent ” 
reason for thinking two things equally likely is the absence of any reason for thinking 
them otherwise. Therefore the paradox in question, which in one form or another is 
always used to phrase the objection of the cogent reasonist to the other point of view, 
is naturally an object of concern: 

The present would be the proper time to consider it, except that the paradox itself 
makes use of certain facts which we shall not be able to regard as established until the 
end of Chapter IV, We must therefore postpone its consideration until that time. 


§ 4. PROBABILITY DEFINED 3 


Before closing this section a word should also be said about 
the conventions by means of which the scale of measure is 
defined. Regarding this scale it has already been agreed that 
unity shall represent certainty, and zero impossibility. The 
end-points of the scale are therefore fixed. The method of 
division is also provided for by Convention II; but it must 
be emphasized that this convention is limited to mutually 
exclusive events, that is, to events of which one at most can 
happen. Why such a limitation is necessary will become 
evident from a simple example. If all days but Sunday are 
“ week-days,” the probability of Christmas falling on a week- 
day is $, for it is the sum of the probability of Christmas falling 
on Monday, on Tuesday, and soon. But the probability of it 
falling “either on Monday or on a week-day”’ is not therefore 
$+24= 1. Such a result is absurd. 

The difficulty is not peculiar to probability theory, but is a 
fundamental one met in all methods of measurement by direct 
comparison. We may say, “A rod is two units long if it 
contains two parts each a unit in length,” but in so saying we 
—obviously mean mutually exclusive parts. We can cut off a 
unit of length from either end of a bar which is only 1.1 units 
in length, both of which are therefore contained in it; but 
we do not therefore conclude that its length is 2. 


§ 4. How Probability is Measured; The Measure of Probability 
Defined 


We are now prepared to make use of these axioms and 
conventions in the formulation of an exact definition of the 
‘measure of probability.” We begin by noting that the 
argument of § 2, from which the number 4 was derived, made 
use of the following facts: 


(1) The event for which the probability was sought (Christ- 
mas falling on Monday) is one of a group of events (correspond- 
ing to the various days of the week). 


(2) The events are mutually exclusive. 


8 PROBABILITY AND ITS ENGINEERING USES 


(3) They are equally likely. 


(4) The group is “complete”; that is, one or the other of 
the events must happen. 


These four facts made it possible to set up the equation 7p = I, 
from which the probability of the event happening was obtained 
in the form p = #. 

If the complete group had contained m events, the prob- 
ability of a particular one occurring would obviously have 
been obtained from the equation mp = 1, and would therefore 
have been p = I/m. 

To the question, What is the probability that Christmas 
falls either on Sunday, Monday, or Tuesday? Convention 
II may be applied. The answer is the sum of the probabilities 
of the separate events ‘contained in this group of three. As 
the probability of each of these is +, the answer must be 3. 
More generally, if there were m members in the complete 
group and it were required to find the probability that some 
one of a smaller group of ” events took place, it would only be 
necessary to add together ” fractions each of the value 1/m. 
The answer is therefore ~/m. This leads at once to the defi- 
rition: 


If a subgroup of n events is contained in a complete group of 
m mutually exclusive and equally likely events, the probability of 
some one of the subgroup occurring is measured by n/m. 


§ 5. How Probability is Measured; Shortcomings of the sel 
nition 


The definition of the measure of ada is not without 
its shortcomings. In the first place, the definition virtually 


1 This definition is often stated in somewhat different terms: Each event in the 
complete group is called a “case,” the events in the desired subgroup being said to be 
“favorable,” and the subgroup itself being called “ the event,” regardless of whether it 
is an individual event or some combination of events. Using the words in this sense, 
the definition appears in the form: 

The probability of an event happening is the ratio of the number of favorable cases to 
the total number of cases, all cases being equally likely. 

We should add that, although we have conformed to the usual custom of calling 
this statement a “definition,” it is in fact a theorem. 


§ 6. FINAL REMARKS A 


implies that both 7 and m are finite; otherwise “ indeterminate 
forms” arise. This difficulty is a superficial one, however, 
for infinite quantities never have a meaning except as the 
result of a limiting process, and the same limiting processes are 
applicable to the ratio x/m as to any other number. 

In the second place, a similar difficulty arises when we ask 
for the probability that a shot will miss its mark by a distance 
of between 11 and 12 feet; for distance is a continuous variable, 
which raises difficulties similar to those met in defining irra- 
tional numbers. We shall find, however, when the occasion 
arises, that we are capable of overcoming this difficulty also. 

There is, however, another difficulty of a much more 
fundamental sort. If we ask the question, What is the prob- 
ability that the next child born in New York is a girl? it is 
impossible to build up any group of events which satisfies the 
conditions of our definition, for though the group “ boy, girl”’ is 
complete and though its events are mutually exclusive, they are 
known vot to be equally likely. There are many examples of this 
sort: almost everything about which “‘statistics” are taken would 
_be suitable as an illustration. In fact, among the questions 

which arise in many fields of research there are so many more 
to which the definition cannot be applied than there are to 
which it can, that many statisticians have been led to seek a 
foundation for the whole subject of probability in the gathering 
of statistics, rather than in pure a priori logic. While I am 
convinced that their definition is untenable as a foundation 
upon which to build a logical structure, we shall find that it 
—or at least something very like it—vwill serve an excellent 
‘practical purpose in overcoming the shortcomings of our 
definition. 


§ 6. Final Remarks 


Once more the importance of the words “complete,” 
“ equally likely ” and “ mutually exclusive ” in the definition, 
and of “ mutually exclusive” in Convention II, must be 
stressed; and to make more emphatic the fact that human 


10 PROBABILITY AND ITS ENGINEERING USES 


nature is prone to forget them, a mistake of a great mathe- 
matician will be used to point the remarks. 

D’Alembert, when asked for the probability that “ heads ” 
will appear at least once in two throws of a penny, argued as 
follows: Heads appear either first, last, or not at all. There 
are thus three events, two of which are included in the subgroup 
desired. Hence the chance of heads appearing during the two 
throws is 2. 

But if “heads first”? (or last) means‘‘ heads then and 
tails the other time” the group is not complete, for “ heads 
both times” is also possible; while if “ heads first ” (or last) 
means ““heads then, no matter what appears the other time” 
the events are neither equally likely nor mutually exclusive: 
not equally likely because they differ in the essential attribute 
that two depend on the result of a single throw only, while 
the other combines the results of two throws; and not mutually 
exclusive because “heads both times” is included both in 
“heads first” and in “ heads last.” Actually, as d’Alembert 
thought of the problem, the question was answered. if heads 
appeared on the first throw, and a second throw was not 
needed; hence to him “heads first” meant “heads first no 
matter what happens last ’’; while “ heads last ” meant “ tails 
first and heads last.” His group, then, was complete and - 
mutually exclusive; but the events were not equally likely. 

Fortunately a suitable group can be found. It is 


heads—heads, 
heads—tails, 
tails —heads, 
tails —tails. 


As three of these events produce at least one head, the correct 


answer is 3. 
x * * 


The above discussion is far from complete, and raises a 
number of questions of a logical nature, the attempt to answer 
which would be of interest if it were consistent with the main 
purpose of the text. These questions, however, must be 


§ 6. FINAL REMARKS II 
passed by, and we take up instead a review of certain algebraic 
laws that will frequently be needed. 


REFERENCES FOR OuTSIDE READING 


. WuitwortH: Choice and Chance, pp. 118-130. 

. CurystaLt: Algebra, Vol. Il, pp. 566-580. 

. Cootipce: Probadility, Chapter I. 

. Venn: Logic of Chance, Chapters I and IV. 

. Keynes: 4 Treatise on Probability, Chapters J-VII. 

. Laptace: Théorie Analytique des Probabilités, Introduction, 

as far as heading “‘ Del’Espérance ”’ (pp. i-xiii in the edition of 1814). 
7. CzuBER: Wahrscheinlichkeitsrechnung, Vol. I, pp. 1-21. 


An pb & YP H 


CHAPTER II 
PERMUTATIONS AND COMBINATIONS 


§ 7. General Laws of Composition of Events 


The study of permutations and combinations rests upon 
two general laws regarding the composition of events. 


First Law or Composition.—If an event A can happen 
in m ways and an event B can happen in n other ways, 
“ either A or B” can happen in m +n ways. 


This law is so simple that it scarcely requires proof. A 
few illustrations will serve to establish its validity. Suppose 
there are three ways of going from New York to Philadelphia 
and two ways of going from New York to Boston. Then the 
number of ways of going either to Philadelphia or to Boston 
is obviously 3 + 2 or 5. This is in agreement with the law, 
but it does not indicate why the ways in which the event 
B happens must be different from the ways in which 4 happens, 


as is implied by the word “other.” A second illustration will ~ 


make this point clear. 

Suppose there are three routes to Philadelphia, one of which 
leads through Princeton, and that there is no other route to 
Princeton. Then although there are three routes to Phila- 
delphia and one to Princeton the number of ways of going 
“either to Philadelphia or to Princeton ”’ is not four, but three. 


law. 


Seconp Law or Composition.—If an event A can 
happen in m ways and thereafter an event B can happeit 
in n ways, “both A and B” can happen in this order in 
mn ways. 


12 


It is evident that the word “‘ other ” is an essential part of the - 


2 


$7. LAWS OF COMPOSITION OF EVENTS 13 

If a penny is tossed it may fall in two ways, heads or tails. 
If a die is thrown it may fall in any one of six ways. Hence, 
according to the second law, the number of results: which can 
be obtained by tossing first the penny and then the die is 
2-6 = 12. This result can be checked by listing the separate 
possibilities. They are 


. heads and ace 


I 7. tails and ace 
2. heads and deuce 8. tails and deuce 
3- heads and three g. tails and three 
4. heads and four 10. tails and four 
5. heads and five 11. tails and five 
6. heads and six 12. tails and six 


As a second illustration, consider the modified checker- 
board shown in Fig. 1, and ask: In how many ways can a man 
be moved from the top row, and thereafter a man from the 
middle row? It is immediately obvi- 
ous that every man in the top row has 
two possible moves, which makes a 
total of 8 ways of moving a man in 
the top row. After moving the man 
in the top row there are always two 
moves possible from the middle row. 
Hence, the number of possible com- 
binations of moves is 8-2 = 16. Again the combinations of 
moves can be listed, and will be found to check this result. 

Both these illustrations agree with the second law. There 
are, however, two essential differences between them. In the 
first illustration the result was obtained by multiplying the 
total number of possible ways in which a penny may fall by 
the total number of ways in which a die may fall. In the 
second illustration, on the other hand, the total number of 
different moves which are possible from the middle row is 8, 
one for each of the end men and two for the others; but the 
correct answer is not obtained by multiplying this number by 
the 8 possible moves in the top row. 

The second difference lies in the fact that in tossing a die 
and a penny it makes no difference which one is tossed first. 


Fic. I. 


14 PROBABILITY AND ITS ENGINEERING USES 


The number of possible combinations is the same in either case. 
In the checkerboard problem the number of ways in which 
a man.can be moved from the middle row and thereafter a 
man from the top row is zero, since a man cannot be moved 
from the middle row at all until a space is opened up for him. 

The cause of these differences lies in the fact that the 
events which take place in the first illustration are independent, 
while those which take place in the second illustration are not. 
The way a penny falls exerts no conceivable influence over 
what the die may do; but the way in which the man is moved 
from the first row determines which men are released in the 
second row and what-moves they may make. The necessity 
of taking account of such dependence reveals itself in the 
presence of the words “ thereafter”’ and “in this order” in 
the statement of the second law. 


PROBLEMS 


1. Regarding the alphabet as consisting of 21 consonants and 
5 vowels, how many distinct five-letter words are possible, each 
having three consonants and two vowels alternated? 


2. In how many of the words of Problem 1 does no letter occur 
more than once? 


3. For purposes of cable code, where a different charge is made 
according as the combinations of letters are pronounceable or unpro- 
nounceable, it might be desirable to obtain a very large number of 
words of the sort mentioned in Problem 1. How many vowels should 
an alphabet of 26 letters have, to be most suitable for this purpose? 


4. The Greek alphabet has only 24 letters: 17 consonants and 
7 vowels. Is it better or worse than the English for the purpose of 
Problem 3? Aa 


§ 8. Definitions 


From the standpoint of the study of permutations and 
combinations, a group of objects has three characteristics: 
the kinds of objects included in the group, the number of 


$3. DEFINITIONS 15 


objects of each kind, and the way in which they are arranged. 
Thus in the group of letters 


a ba a, 


the fact that there are two kinds of objects, that there are 
three of one kind and one of the other, and the way in which 
these are arranged in the group would be the pertinent informa- 
tion. 


Two groups of objects are said to form different “combina- 
tions” if they differ in the number of any kind of object included. 
Thus the combination 


ay dey 2) FI 
differs from the combination 
a ba, 


because the number of a’s is not the same as before. It also 
differs from the combination 


abab 


for the same reason, although the total number of objects in 
the group in this case is left unchanged. 
On the other hand, the combinations 


ab aca, 
ava bia, 
beacdia,; 
Aaada ob, 


are all the same, since the number of a’s and the number of 
b’s is the same in each case. 


Two groups of objects are said to form different “ permutations” 
in either of two cases: (a) if they form different combinations; 
(2) if they form identical combinations but differ in arrangement. 


Thus 


16 PROBABILITY AND ITS ENGINEERING USES 


are identical combinations because each of them has one 
-a,oneb,onecandoned. They are all different permutations, 
however, because the arrangement of the letters is different in 
each group. 
The groups 
a bcd, 
oe 

are different combinations because they have different num- 
bers of c’s. They are therefore also different permutations. 
As another illustration 

as Dread; 


a, bacre, 


are different combinations and therefore also different permu- 
tations. 


§ 9. Application of the General Laws of Composition to Permu- 
tations and Combinations; Some Typical Examples 


The two general laws stated in § 7 make possible the solu- 
tion of many of the problems of permutations and combinations. 
A few examples will illustrate the method of procedure. 


Example 1.—How many permutations of three letters each can be 
formed from the lettersa b c d? 


The answer to this question may be obtained by thinking 
of the process involved in writing down the various permuta- 
tions. In writing down any permutation there are just four 
ways in which the first letter can occur. After this event 
has taken place there are three ways of writing the second 
letter. Finally, the third letter can be written in only two 
ways. Hence the number of ways of writing three letters is 
4°3:2 = 24. There is no other conceivable way of putting 
the letters together. Hence there are just 24 permutations 
of four letters three at a time. 

Table I shows these permutations arranged in the order in 
which they were supposed to be obtained. The four ways of 


§ 9. APPLICATIONS OF THE GENERAL LAWS 17 


writing the first letter correspond to the four vertical columns. 
In each of these columns there are three pairs corresponding 
to the three ways of choosing the second letter. Finally, the 
two members of each pair correspond to the two ways in which 
the last letter may appear. 


ExamPLe 2.—How many combinations of three letters each can be 
formed of the letters abcd? 


TABLE I 
PERMUTATIONS OF Four LETTERS THREE AT A TIME 
abc bac Ca-b dab 
abd ly eG! Ceaga dac 
ace be bea ¢ ba dba 
acd bed e bd dbe 
adb bda cda Ge A 
adc Dail c cdb aeab 


Because of the simple numbers involved the easiest way 
of getting the answer to this problem does not involve the 
two general laws at all. Whenever a group of three is chosen 
from four letters, one is left; and it is evident that there will 
be as many different combinations of three letters as there are 
different ways of having ove letter left over; that is, four. 

Unfortunately, most problems cannot be so easily solved. 
To illustrate the general process the same answer will be 
obtained in a less direct way. 

Two permutations differ, (2) when they are different com- 
binations and (4) when they are different arrangements 
(i.e., permutations) of the same combination. This is a matter 
of definition. Suppose x is the number of combinations of 
four things three at a time. Each of these combinations is 
a group of three, and is capable of a certain number of dif- 
ferent arrangements. Call this number y. Every other com- 
bination is also capable of y permutations, thus making xy 
in all. Now it is obvious by definition that no two of these 


18 PROBABILITY AND ITS ENGINEERING USES 


are identical permutations; and it may easily be seen that 
every possible permutation is included, since any possible 
permutation is a particular arrangement of some combination, 
all of which have been counted. But it is already known 
that there are just 24 permutations. Hence 


xy = 24. (1) 


As for y itself: The first letter of a group of three can be 
chosen in three ways, the second in two, and. the third in one. 
Hence y = 3-2-1 = 6. Substituting this value in (1) it is 
found that x = 4. This is the same result as before. 


§ 10. Application of the General Laws of Composition to Permu- 
tations; General Theorems 


The processes used in Examples 1 and 2 are perfectly general 
and make it possible to obtain formula for the permutations 
and combinations of groups of objects. 

First consider a group of m different objects, and attempt 
to find the number of permutations of 7 objects each which 
can be made from this group of m. 

Since the objects are all distinct, the first one in order 


can be chosen in m ways. Thereafter the second can~be _ 


chosen in m — 1 ways, the third in m — 2 ways and so on. 
In general the number of ways of choosing an object is m 
minus the number of objects already chosen. When the choice 
of the last, or mth, object is reached, 7 — 1 will already have 
been chosen; and therefore the last choice can take place in 


m—n-+1 ways. The entire group of possible choices there- 


fore numbers 
m(m—1)(m— 2)... (m—n+1). (@) 


This is a perfectly general formula. If m = 4 andn = 3 
the problem treated in Example 1 is obtained. In this case 
m — n + 1 is 2 and the answer is the product of all the integers 
from 2 to 4 inclusive. This, of course, is the same result as 
was obtained before. 


§1o. GENERAL THEOREMS ON PERMUTATIONS 19 

An important special case of (2) is that in which m and n 
are equal. In this case m— +1 = 1 and (2) becomes 

aioe a oO (3) 
This is the number of different ways in which a group of m 
unlike objects can be arranged, or, in technical terms, “ the 
number of permutations of m things m at a time.” 

A slightly more general problem is: A group contains s 
kinds of objects. There are m of the first kind, mz of the 
second kind, and so on, the total number of objects being 
m=m, + m2+ ... + ms. Inhow many distinct ways can 
they be arranged? 

What is meant by saying that m objects are “‘ of the same 
kind,” is that interchanging two of these objects makes no 
difference in the arrangement of the group. For instance, 
a a b b is a certain group: it may be thought of as a row of 
alphabet blocks. If the blocks containing the a’s are inter- 
changed the group is still a a b b, and is unaltered. 


Suppose the answer to the problem is x. Let one of these +: 


x permutations be chosen, and the objects of like kind be 
tagged to establish their identity. By this means all the 
objects are rendered distinct. Then the m objects of the 
first kind will be capable of mi(m, — 1) ... 2-1 permutations 
among themselves, none of which, however, would have been 
different from the one chosen if the objects had not been 
tagged: Those of the second kind will likewise be capable of 
m2(mz — 1) ... 2-1 permutations among themselves, any of 
which may be associated with any of the permutations of the 
objects of the first kind without altering the original permuta- 
tion of the untagged objects. A similar statement can obvi- 
ously be made for each of the s kinds of objects. Hence for 
each of the original permutations there is now a total of 


gee 9m) CE 2. Gh aa) (1 9" 1. ta) “eC Swe, 


permutations, all of which were made possible by tagging the 
objects. This makes a total of 


(1-2... mi)(1-2... ma)(1-2... ms) Sangh Teese) ola) 


20 PROBABILITY AND ITS ENGINEERING USES 


It must now be noted that every possible permutation of 
the m= m+ m2+...+m, tagged objects is included in 
this way: for if there were another one, and if the objects 
were arranged in this permutation and the tags taken off, the 
result would of necessity be one of the untagged permutations. 
All possible arrangements which can be made from these, 
however, have already been provided for in (4).- Hence this 
“other one’? must be identical with one of those already 
provided for. 

Finally, it is already known from (3) that the number of 
permutations of the tagged objects is 


1-2-3 m 
Hence, from (4) 
“i [20 9 tants 
ae (12.5 mi) (a 2 a ae ere (3) 
PROBLEMS 


1. How many permutations of the letters of the word “ con-_ 
catenation ”’ are possible? 


2. A firm has four positions available, and a list of eleven appli- 
cants. How many possible ways are there of filling them? 


3. A horseshoe contains eight nails. In how many different 
orders may they be driven? If enough horses are to be provided so 
that one shoe may be attached in every possible way, how many 
miles of four-foot stalls would be required to accommodate them? 


4. There are available m1 objects of one kind, mez of another, and 
so on. How many possible permutations can be built up using m1 
of the first kind, m2 of the second, and so on? It is assumed that . 
each m is at least as big as the corresponding 7. 


5. What is the answer to Problem 4 if one 7, say, 71, exceeds the 
corresponding m? 


§ 11. Factorials; the Gamma. Function 


The combination of numbers which presents itself in (3) is a very 
common one in more than one branch of mathematics; so much so 
that it has been given a name and a shorthand symbol. It is called 


§ 11 ; FACTORIALS 21 


a3 ‘ ” : : : 
factorial m,” and is usually written as m! in modern books. Some 
years ago a more common notation was | m. 


Factorial m may therefore be defined as the product of all integers 
from I to m, inclusive. When so defined, it has no meaning unless 
m ue an integer; but this defect can easily be remedied by certain 
artifices. 


When 7 is an integer, it is easily shown by direct integration that 


f we "dw = m\. (6) 
/ 0 


As yet, when m is fractional, m! means nothing, and it is therefore 
impossible to say that the equation (6) is either true or false. If, 
however, (6) instead of (3) is adopted as the definition of m}, the 
symbol will have the same significance as before for integers, and 
it will also have a meaning for fractions. This is the definition 
usually adopted. 

Even this definition applies only when m >— 1, for the integral 
does not have a value for other values of m. The integral obeys a 
certain law, however, by means of which the definition can be extended 
to all real numbers. 

Integrating (6) by parts, it is found that 


iv 0] i> a) [e2} 
Ae we "dx =— xe" "| +m i Be ee ax. 
0 0 0 


As the second term vanishes at both limits of integration? this 
equation may be rewritten 


m! = m(m — 1)!. (7) 


1As a matter of fact, a much more complicated definition is given for purely 
mathematical purposes, in order to overcome the limitation to which the next para- 
graph of text calls attention. So far as these notes are concerned, however, the 
definition (6) is entirely satisfactory, and the student will not be misled by it. 

2 The second term obviously vanishes when x = 0. When x = © it takes the form 
© -o, which is indeterminate. To evaluate this “indeterminate form,” the usual 
process is to replace x”e—* by x”/e” which is identical withit, but takes the form «/o, 
The next step is to differentiate numerator and denominator separately, getting 


mxm—1 


e 


This is still in the form 0/c0 , so another differentiation is performed, and then another 
and soon. The denominator in each successive step remains the same, but each step 


22 PROBABILITY AND ITS ENGINEERING USES 


This law can be more easily obtained from (3) when m is integral; 
having been obtained from (6) it is true whether m is integral or not. 
Moreover, if it is taken as a universal property of m! it makes it 
possible to define m! when m is negative. 3 

For instance: it is known! that 4! =4-+/zx. By setting m = $ 


reduces the degree of the numerator by 1. After 7 differentiations the form appears 


mn = 1)\(n — 2). .( = mt I)R” 


e 


If m is an integer, by setting 2 = m the numerator becomes a finite constant and 
the fraction (for x = ©) takes the form C/ «=o. 

If m is a fraction, 7 can be chosen as the next /arger integer, in which case the 
fraction takes the form 


As 1 — m is positive, doth terms in the denominator become infinite with x, and a 
fortiori their product does. Hence, as before, the fraction takes the form C/o =o. 
Those who wish to renew their acquaintance with this process may look up the 
subject of “indeterminate forms” in any Calculus textbook; e.g., March and Wolf, 
Calculus, pp. 324-327. : 


1 By definition 


Replace « by y?. Then 


o 


But it makes no difference whether the variable is y or z. Hence 


[e-o} 
z!= of e~? 22 dz, 
0 


Now, multiply these equations member by member and place the z’s under the sign 
of y-integration, with respect to which they are constant. 


3 20 i) 3 A 
($1)? = of y eT P42?) y2 22 dy dz, 
0 0 4 


Finally, note that in the yz-plane dy dz is the element of area, and that the integral 
extends over the entire first quadrant. Rewriting the integral in terms of polar 
coordinates, which means (see § 68) 


yy =rcosé 
Z=rsin@ 
dy dz = r dr do, 


§ rr. FACTORIALS 23 
in (7), and inserting this known value, it is found that 


W/r = 4(- 3)! 


or 


Similarly by putting m =-— 4, 


Vr=-H- 9)! 
(— Ql = ave. 


By starting from a suitable positive factorial and using this device 
it is possible to obtain the factorial of any negative number whatever. 

The factorials of negative integers are particularly interesting. 
It is obvious that 1! = 1. Setting m = 1 in (7) it is found that o! 
also equals unity. Then putting m = 0 in (7) 


or 


o! = o(— 1)! 


or 
(=Nl=}= &. 
By the same process: 
(— 1)! 
Se My lh ee Es Be 
(a= = a, 
Gail ee See 
(—3)!= pian 
=) (47 ( 2) 
: Cea ay 
= ! = = = © ; 
ee ee Da 9) 
it becomes 
$!)? = if arf * a0 e—7 cos? @ sin? 6. 
But 
cos 6 sin? 0 d0 = 1/16, 
0 
and _ 
f re—" dr = 1. 
0 
Hence 


24 PROBABILITY AND ITS ENGINEERING USES 


and so on. Since infinity divided by any succession of finite quan- 
tities is still infinite, it follows that the factorial of any negative integer 
is infinite. 

The behavior of the function will be made clearer by reference 
to Fig. 2, in which the factorials of numbers between — 6 and + 3 
are plotted. 

For completeness one 
more fact should be men- 
tioned. It is customary 
among mathematicians to 
speak of “ factorials”’ only 
when referring to the fac- 
torials of positive integers. 
In other cases they talk about 
a ‘“‘gamma-function” or 
“T-function.” This function 
is related to the factorial dis- 
cussed above by the law 


T(m) = (m —1)!. 


Thus I(3) = 2! = 2.1 = 2; 
and.» (E(G) i= t=.) = eae 
As the multiplication of 
names and symbols serves no 
useful purpose, all such numbers will be called “ factorials”’ in 
what follows. 

From the above discussion the student should retain the following 
facts: 


Fic. 2.—Tue Facroriat. 


(a) Factorial m, when m is an integer, means the product of the 
integers from 1 to m. It is written m! or |m. 


(2) The symbol has a meaning for other numbers than integers. 
Its value may be found tabulated, just as logarithms are. 

(c) The factorial of zero is equal tounity. fo! = 1.] 

(d) The factorial of a negative integer is infinite. [(— j)! = o 
Tho ee eKen| 

(e) The factorials of negative fractional numbers are not infinite. 


There will be many occasions to use factorials in what follows, 
but they will generally be the factorials of integers. To simplify 
numerical calculations, the values of the factorials of integers up to 
200 are listed in Appendix I. As many of the numbers are incon- 
ceivably large, only a few significant figures are written, followed by 


§13. GENERAL THEOREMS ON COMBINATIONS 25 


the power of ten by which they must be multiplied. Thus 20! is 
given as 2.432 9020!8 which means 


2 432 go20-10!8 = 2,432,902,000,000,000,000. 


In Appendix I are given the logarithms of the factorials up to 1200. 


§ 12. Restatement of General Theorems in Permutations 


Let the symbol P;’ stand for the number of permutations 
of m things 7 at a time. Then (3) may be rewritten as 
PR =m(m—1)...(m—n+1) 
_ mm —1)...(m—n+1)(m—n)(m—n—1)...2+1 
- (m —n)(m—n—1)... 21 : 


Making use of the factorial notation this reduces to 


m! 
(m —n)\ 


P= (2) 


Similarly (3), which represents the number of permutations 
_of m objects m at a time, becomes 
Pe = mn}, (3) 


Finally (5), which gives the number of possible arrange- 
ments of a group composed of s kinds of objects, mm of the 
first kind, m2 of the second, and so on, is 


(5) 


 m,!meo!...m,)° 


§ 13. Application of the General Laws to Combinations 


The number of combinations of m things taken ” at a 
time can be computed by the same line of argument as was 
used in Example 2 of §9. Each combination is capable of 
_ a! permutations within itself. If, therefore, the number of 
combinations is denoted by Cy, the total number of permuta- 
tions will be 

Poe oh. 


26 PROBABILITY AND ITS ENGINEERING USES 


The value of P”, however, is known from (2). Hence 


m!\ 


Cs “nh =a) e 


(8) 
This is the general formula for the number of combinations 
of m things taken 7 at a time. In particular, if m is 4 and 
is 3, the case is the same as that considered in Example 2 
of §9. Substituting these values in (8) the answer is found to 
be 
4h 493-2ed 


2h eae 


which checks the result obtained in § 9. 
Some further examples may be given. 


> 


ExampLe 3.—How many straight lines can be drawn through five 
points in such a way that each line contains two points ? 


A straight line can be drawn through any pair of points 
whatever. There will therefore be as many straight lines as 
there are pairs of points. The only question which remains 
for consideration is as to whether the pairs of points pi p2, and 
p2pi are to be considered as identical or distinct. In the 
former case the problem becomes a problem in combinations; 
in the latter it is a problem in permutations. 

It is obviously possible to draw a line from a to 6 and 
also one from @ to a, but in the sense in which lines are usually 
thought of these two lines would be identical. In other words, 
the direction in which the pencil travels in marking out the 
line is of no consequence. Hence pip2 and p2pi must be 
thought of as identical groups and the problem becomes a 
problem in combinations. Its solution is therefore ! 


1 There are exceptions to this answer. For instance, if all five points lie upon the 
X-axis of a system of co-ordinates there are not ten lines which can be drawn through 
them. Instead the X-axis itself is the only possible line. In obtaining the answer 
above it is tacitly assumed that no three points are collinear. 


§ 14. PASCAL’S TRIANGLE 27 


ExamPLe 4.—How many distinct hands of Z3 cards can be dealt 
from a full pack without a joker 2 


This is obviously a question of combinations since the order 
in which the cards are dealt is immaterial. The answer is 
Ee 52! 


Cy = 13! 39! ie 635,013,5 59,600. 


PROBLEMS 


1. How many possible stacks of 7 dominoes can be dealt from a 
set of 28? 


2. There are 11 candidates from among whom 200 electors are 
to choose a board of directors. Half of the electors have two votes 
each to one each forthe rest. The board is to consist of five members. 
On each ballot the lowest man is discarded, until only five remain. 
How many boards are possible? 


§ 14. Some General Properties of Cy; Pascal's Triangle 


The first solution of Example 2, § 9, was obtained by noting 
that whenever a different combination of three letters was 
taken out of the four supplied, a different combination of one 
was left. A little reflection shows that this is a general law; 
whenever a different combination of 7 things is taken from a 
group of m a different combination of m — n things remains. 
There must therefore be at least as many distinct combinations 
of m — vas there are of n, that is, 


Cie or) (9) 


This rule, however, works backward just as well as forward. 
Whenever a different group of m — is taken out, a different 
group of 7 remains. Hence, the distinct groups of 7 must 
be at least as numerous as the distinct groups of m — n. This 


leads to 
CE ee (10) 


By comparing (g and (10) it is seen that 40th can be true 


28 PROBABILITY AND ITS ENGINEERING USES 


only when the equality signs are used. Hence it follows as 


a general rule that ! 
Cr = Gao (11) 


Another general property of the quantities C;’ can be 
obtained from the obvious equation 


(m — n)(m — 1)! + n(m — 1)! =m). 


If this equation is divided throughout by both! and (m — n)! 
and a few obvious cancellations are made, it reduces to 


Opa a 6a EE, (12) 


This equation belongs to the class known as “ recursion 
formule.” In other words, it is of such a nature that once the 
combinations that can be made from a group of m — 1 things 
are all known it is possible to find the combinations that can 
be made from m things. For instance, in the column headed 
5 in Table II, 5 is the number of combinations of one thing 
each which can be made from a group of five distinct objects; 
10 is the number of combinations of two each; and so on, as 
indicated by the marginal numbers.? 

Similarly the column headed 6 contains the values of C). 
It can be obtained from the first column by the use of (12). | 
For upon putting m equal to 6, (12) becomes 


G=G.4 Cx, 
1 This rule can also be derived as an immediate consequence of (8), for 


m! m! 


Chan = = 

m2 (m —n)'[m —(m —n)|! (m — n)!n! 
The logical argument, however, may carry somewhat greater conviction than the 
formal manipulation of symbols because it makes it possible to “see why”’ the answer 
is correct. 

2 There is also added, opposite the marginal number o, the value taken by (8). 
when x= 0. This value is unity, regardless of the value of m, so that the entire row 
is filled with 1’s. Of course, this quantity does not make sense when phrased as “the 
number of different groups of no things each which can be formed from a group of 


m things”; it must be regarded as an extension of the meaning of symbol Ch by 
definition. It is required for the general validity of (11) and (12). 


SEs. THE BINOMIAL THEOREM 29 


which says that each entry in the column headed 6 is the 
sum of the adjoining number in the 5 column, and the number 
next above the latter. Thus C$ is the sum of Reandris eC, 
is the sum of Io and 5, and so on. 
The 7 column can be obtained in the same way from the 
6 column. 
TABLE II 


m 
VaLurs or C,, 


” t= 5 rm = 6 1=7 
° I I 

I 5 6 

2 ste) 15 

33 Io 20 35 
4 5 15 

5 I 6 21 
6 ° I 

a fe) ° I 


Appendix III contains the values of Cy for groups not 
larger than 100. Because of the large numbers involved this 
table is written in the same notation as that which was used 
in Appendix I. Moreover, the size of the table has been 
reduced by making use of the property (11), according to which 
the numbers in any column repeat themselves in the inverse 
order after the center of the column is passed. For this 
reason it is only necessary to tabulate the numbers up to the 
point where they begin repeating. For instance, C3; is not 
given in the table, but is known from (11) to be equal to Gr 
which is given as 3.0405943'%. 


§ 15. Some General Properties of Cr; The Binomial Theorem 


The rule for algebraic multiplication is, that every term 
of the one factor of the product is to be multiplied by every 
term of the other factor of the product, and all of these partial 
products are then to be added together. This is what mathe- 


30 PROBABILITY AND ITS ENGINEERING USES 


maticians call the “ distributive law of multiplication.” It 
can also be phrased ‘in the following words: “‘ The product of 
two polynomials is the sum of the partial products of all pos- 
sible combinations of terms obtained by taking one term from 
each of the factors.” 

Suppose a product of two factors has been obtained in this 
way and that this product is then multiplied by a third poly- 
nominal. The product thus obtained contains the sum of all 
possible partial products of three terms each, which can be 
formed by taking one term from each of the three polynomials. 

In general, when m factors are involved the product must 
contain the sum of every possible partial product which can 
be formed by taking one term from each of the m polynomials. 

Suppose now that each of the m factors is taken as the 
same binomial x+y. The product will then represent 
(x + y)". Since there are only two possible terms, * and y, 
every partial product must be a power of x multiplied by a 
power of y. Furthermore, since one term must be taken 
from each factor, the sum of the two powers must always be 
equal to m. In other words, the product, when it has been 
completely worked out, must take the form 


Cox beer y + ocgn ty? + 1.2. tomy”, (13) 


where the values of the c’s are as yet unknown. 

However, it is not difficult to determine the numerical 
values of these c’s, for the complete product must. contain 
just as many partial products of the form x" y"™" as there are 
different ways of choosing x’s from m different factors, the 
order of choice being immaterial; in other words, it is C™. 
The y’s, of course, come from the remaining m — n factors. 
That is, the coefficient of the mth power of x in the binomial 
expansion of (7 + y)” is equal to the number of combinations 
of m things taken 7 at a time. 

Substituting this value in (13) it takes the form 


Co GT ey Cr en ye eee ae Dye ACTS) 


$15. THE BINOMIAL THEOREM 31 


or in shorthand notation ! 
(« + ON = x Caton (15) 


Referring back to Appendix III the reader will at once 
recognize the sequences of numbers 


PeQead 
Fy239* 3a} 
as belonging to the well-known binomial expansions 
@ bays 1x +1-y 
(w+ y)4 = 1x? + a-wy + 1-y? 
(ley )8 es eet ony ale? oT» ge 
Even the integer 1 standing alone under the heading m = 0 
fits into this arrangement because (w + y)® = 1. 


1 The Z-notation for the sum of a number of terms all of which follow the same 
law of formation is very convenient and enables us to write quite complicated 
expressions ina comparatively simple form. This notation will be frequently used 
in what follows. The idea upon which it is based can be easily illustrated by reference 
to (15). 

The product 
™ m—n Nn 

Cc, * y 
is called the “general term”’ of (14). This means that by assigning different values 
to m it is possible to obtain every term of (14). For instance if 7 is put equal to 
2 the term 

by 
a tas 


is obtained. If is put equal to 1 the term 
Cw. ahi, 


is obtained. Proceeding in this fashion and taking the proper set of values for 7, 
every term of (14) may be duplicated. 

The = of (15) is a command to add together the proper set of terms of this sort. 
The symbols written above and below it define ‘“‘the proper set” by telling the 
largest and smallest values of 7 respectively. It is understood that all intermediate 
integers are to be used. 

Thus the right-hand side of (15) is shorthand notation for the instruction “build 


up a term of the form 
oe gata lt 


for every integral value of from o to m inclusive, and add all these terms together.” 


32 PROBABILITY AND ITS ENGINEERING USES 


§ 16. The Solution of More Complicated Problems 


There are many problems in permutations and combinations 


which do not fall into the simple classes dealt with above. ~ 


Such problems can often be solved by the intelligent use of the 
general laws stated in §6. A few examples will illustrate the 
general line of attack. 


ExampLe 5.—How many combinations of four letters each can be 
made from the word pepper ? 


This example is complicated by the fact that some letters 
are repeated. The simplest method of dealing with it is to 
note that there are two general ‘classes of combinations, those 
which contain r and those which do not. 

Those which contain r must also contain some combination 
of three formed from three p’s and two e’s. It is a simple 
matter to list these combinations, which turn out to be 


P PP» 
PPé&, 


pre. < 


Those groups which do not contain r must have four letters _ 


chosen from the three p’s and the two e’s. It is evident that 
there are only two such combinations: 


PPP& 
ppee. 


Since the cases which contain r and those which do not are 
mutually exclusive, the first general law of composition gives 
the total number of combinations as five. 

In solving this example it was not necessary to make use 
of any of the formule which have been derived for permutations 


and combinations. An illustration in which those formule 


are useful as an adjunct to the general laws is the following: 


EXAMPLE 6.—How many combinations of four letters each can be 
made from the letters of the word provocative ? 


§ 17. * A COMPLICATED PROBLEM 33 
This word has eleven letters, two 0’s, two v’s, and one each 
of the seven letters a, c, é, i, p, r, t. 


The possible combinations of the o’s and v’s are ue 


———— 


Ss 


OOVV OOV OO O 
OVMVian OV AV) 
VV 


The first column contains a combination of four; it is therefore 
one of the cases sought. The second column contains com- 
binations of three, and by the second law of composition it is 
known that each of these can be combined with every possible 
combination of one formed from the seven distinct characters. 
Likewise each of the combinations in the third column can be 
combined with the C3 combinations of 2, and each of the 
fourth column with the C3 combinations of 3. Finally there 
are Cy combinations of four letters containing neither o’s 
nor v's. Using the first law, the total number of combinations 
is found to be 


1+2-Ci+3-Ci+ Siok yest 


This works out to be 183 combinations in all. 

It is frequently necessary in considering problems in per- 
mutations and combinations to make use of artifices of this 
sort. Insuch cases the ease with which the solution is obtained 
depends very largely upon the appropriateness of the method 
of attack, so that experience and that sense which we call 
intuition are necessary before such problems can be satis- 


factorily handled. 


§17. 4 Complicated Problem in Permutations 


The following problem is introduced for two reasons: In the first 
place it leads to a result which is of some interest on its own account; 
in the second it affords an excellent drill in the purely formal thinking 
that is often associated with the use of the 2 notation. 

It deals with a question similar to that treated in Examples 5 
and 6, except that permutations instead of combinations are asked for. 


_ It may be stated as follows: 


34. PROBABILITY AND ITS ENGINEERING. USES 


ExampLe 7.—How many different permutations of n things each 
can be made from m, of one kind, mz of a second, and so on, there being 
s different kinds available? No restriction is placed upon the proportion 
of each kind to be included in the permutations. 


Now, though the exact make-up of any permutation is unknown, 
it must have some number of objects of each kind. Let them be 
denoted by m1, m2,..., ms Then according to the result of 
Problem 4, § 10, there are just 

a n!} 


Pa ma oe => (16) 


m1! nol... Ma! 


permutations for which the 7’s remain unchanged. Obviously the 
total number of permutations, Py,” ”* °° *”", will be obtained by add- 
ing together terms of this sort, one term for every possible way in 
which a group of 2 may be made up. The result must therefore take 
the form 
Pye Mss = ae eee 
ny ng Ng 

provided the proper limits are assigned to the summations. 

These limits are found as follows: 

In the first place, since 


i oe ls = 


one of the 7’s, say 7,, is not arbitrary, but has its value fixed as soon™ 
as the others are known. The 7,-summation therefore consists of but 
a single term for which 


Ng =N— Ny — Ng—... — Ny-1. 


But if there is but a single term the sign of summation may be 
dropped. This gives 


PM Ma NS) 3 nm (17) 


a rT nm!ng! see rea (n—11—n2—...—s-1)! [ 


Next, it may be noted that 7; cannot exceed m1, m2 cannot exceed 
m2, and so on. This suggests that the upper limits of summation 
shall be m1, m2,..., ms_1. But caution must be observed in this 
matter. For if some one m is greater than m this would seem to 
require the use of permutations in which the number of objects of 
one kind exceeded the total number of all kinds, which is obviously 
impossible. . 


$17. A COMPLICATED PROBLEM 35 


However, if any 7;, say m, exceeds n, the term 
(2 —mf— no —...— Ms-1)! 


becomes the factorial of a negative integer, which is known to be in- 
finite. In fact this occurs whenever the sum of 7; + m2 +...+ Roa 
exceeds 7. “As this factorial occurs in the denominator of (17), it 
follows that whenever a set of values of m, m2, ..., Ms is used, 
which is impossible because the total exceeds 2, the corresponding 
term in (17) automatically becomes zero. Hence, even though 
m1, M2, ..., ms1 may be bigger than the correct limits, they may 
nevertheless be used, for the additional terms which they introduce 
into (17) are all zeros. 

Finally the lower limits of summation must be obtained. To this 
end, consider first the case of 71, and think of a particular illustration 
in which 2 = 7 and the groups of like objects are nine a’s, two b’s 
and three c’s: that is, m1 = 9, m2 = 2 and m3 = 3. Obviously, since 
there cannot be more than two b’s and three c’s in any permutation 
there must be at least two a’s. That is, 2; cannot be less than 2. 

By the same line of argument, since 2, 73, . . .) #s cannot exceed 
M2, M3, .. -) Ms, respectively, it follows that their sum cannot exceed 
mz +m3+...-+ms;. But as the total number of objects in a per- 
mutation is fixed at 7, the smallest number of objects of the first 
kind will occur in those permutations which have the largest aggregaet 
number of objects of all other kinds. That is, 71 cannot be smaller 
than 

n— m2—™m3—...— Ms. (18) 


The same argument applies to 72, except that, in carrying out the 
second summation the value of 7; is supposed to be known, and 
therefore only the remaining n’s must be made as large as possible. 
Thus 2 cannot be smaller than 


Ub, PUD ee ee Oe RET 


Similarly the other 7’s must satisfy the following inequalities: 


ny = Nn — Hi — 12 — M4 — 1. My 

M4 =n — nm — N2— 13 — mM —.++-— My, 
oo eee 

Reyes 7 — id aa ao 


Before these numbers can be used as limits of summation, however, 
it must be observed that 1, 72, 73, . . -» %. cannot be less than zero. 


36 PROBABILITY AND ITS ENGINEERING USES 


Hence the lower limit of summation on 7 is (18) when that is positive: 
otherwise it is zero; and similar statements must be made for the 
other ’s. However, if (18) were negative, and if it were used as a 
lower limit of summation, it would introduce certain excess terms 
for which 1 was negative. Due to the presence of 7! in the denom- 
inator of (17) all these terms would be zero. Hence the use of (18) 
would give a correct result. 
A like statement applies for every other summation. Hence 


Poe ae 
m4 mg ™Ms—1 n\ 
= > ES 
halen aa See Ree ee i eee (n—11—N2—...—Ms_1)! 


— 6 ele US: eee —Me eee —Mg-2— Me 


(19) 


This, then, is the formal solution of the problem. 


PROBLEMS 


1. In how many ways can four cards in sequence in the same 
suit be chosen from a full pack if the order of choice is immaterial, 
so that j ; 


SIX s1X eight 
seven nine six 
eight eight seven 
nine seven nine 


are all regarded as identical? 


2. In how many ways can a cribbage hand of six cards be dealt, 
(a) if the order of dealing is taken into a¢count, so that identical hands, 
the cards of which appear in different orders, are regarded as different? 
() if the order of dealing is neglected? 


3. In how many ways can seven keys be arranged on a ring? 


4. A printing telegraph machine contains a number of sliding bars 
each capable of taking two positions in response to current pulses of 
two different types. When the bars have all been set, one and only 
one character is selected for printing. It is evident that the number 
of characters which the machine is capable of selecting will depend 
upon the number of bars. How many bars are required to handle 
50 characters? 


5. If in the printing telegraph of Problem 4 the bars are capable 
of taking three positions instead of two, how many bars are required? 


§ 17. PROBLEMS 37 


6. In how many ways can p -+’s and —’s be placed in a row 
so that no two —’s come together? 


7. How many different combinations can be made of 10 objects, 
3 of which are alike and 3 others alike, the remaining 4 being different, 
(a) if the number of objects in a combination is not restricted; (6) if 
it is equal to 4? 


8. How many different connections must a telephone exchange be 
capable of setting up, if it accommodates 10,000 subscribers? 


The cylinder lock illustrated in Fig. 3 contains five tumblers. 
These tumblers are in the form of pins, cut in two parts so that when 
forced into the proper position by the edge of the key (Fig. 34) they 
offer no restraint to the rotation of the cylinder. The cuts may be 
made at any one of ten points along the pin. When the key is out, 
as in Fig. 3a, or if the wrong key is inserted, the cuts do not all 
coincide with the edge of the cylinder, and it cannot move. 

If a master-key is required for a number of locks, certain tumblers 
may be cut in more than one place, as in Fig. 3c. When this is done, 
one set of cuts is the same on the tumblers of all locks; they can 


(?) 


Lock in Normal Position Right Key in Place Master Key in Place 
Cuts Not in Line Cuts in Line One set of Cuts in Line 
Hic a9 


therefore all be opened by the same master-key. The other set of 
cuts is different for every lock, so that the key corresponding to any 
one will not operate the rest. In Fig. 3c, those cuts which are not in 
line would be brought in line by the key shown in 4. Thus this 
particular lock could be operated by either key. 

The following problems refer to locks constructed on this system: 


9. How many different locks can be made without changing the 
keyway? (No double-cut pins.) 


10. When the key of a lock is stolen, and it is desired to protect 
the owner against possible entry, the pins are taken out and inter- 


38 PROBABILITY AND ITS ENGINEERING USES 


changed. If three of the pins happen to be cut differently, the 
remaining two being alike, how many times may this be done without 
duplication? 


11. If double-cut pins are used on the first and fourth tumblers, to 
provide for a master key, how many distinct locks will it open? 


12. Each of five floors of a hotel is to have a separate and distinct 
master-key. How many rooms may each floor have, without per- 
mitting any guest’s key to open any but his own room, provided 
(a) only one double-cut pin is used; (4) two double-cut pins; (c) three 
double-cut pins? 


REFERENCES FOR OursiIDE READING 


1. ToDHUNTER: Algebra, pp. 286-297. 

2, Curv~ 1: Algebra, Vol. Il, Chap. XXIII. 

3. WH. ~— RTH: Choice and Chance, Chaps. I and II. 

4. Nerro: Lehrbuch der Kombinatorik, Chaps. I, I1, XIII. 


CHAPTER III 
ELEMENTARY PRINCIPLES OF THE THEORY OF PROBABILITY 


§ 18. Complementary Probabilities 


,’ 


The events “4 happens” and “4 does not happen” are 
mutually exclusive. Hence, by Convention II, § 3, the prob- 
ability that “‘ either 4 happens or 4 does not happen,” is the 
sum of their separate probabilities. But one or the other of 
these two events is certain to occur, and ther “ore the sum 
must be unity. Hence, if the probability that. éappens is 
denoted by P(4) and the probability that 4 does not happen 
by P(4) it follows that 


P(A) + PG 13 
or 


RN rt ee A): 


The numbers P(4) and P(Z), which represent the probability 
of an event taking place, and the probability of it not taking 
place, are known as “‘ complementary probabilities.” 


§1g. Unconditional Probabilities 


The simplest sort of problems in the theory of probability 
are known as “ problems in unconditional probability.” Their 
principal characteristic is the assurance with which the condi- 
tions surrounding them can be stated. It is impossible to 
describe them more exactly, as their classification is of a vague 
and somewhat illogical nature, but the implication of the 
name, which is a very useful one, will become clear in the 
course of a few sections. They can frequently be solved by the 

& 


39 


40 PROBABILITY AND ITS ENGINEERING USES 


direct application of the fundamental definition of probability. 
A few examples will illustrate this type of problem. 


ExampLe 8.—What is the probability of throwing an ace witha die 2 


There are six faces to the die, only one of whch may 
appear. In an actual die they are not equally likely to appear, 
for the die is certainly unsymmetrical to a greater or lesser 
degree. The nature of this asymmetry is unknown, however, 
and therefore plays no part in the problem. There are other 
known characteristics of the faces, such as the number and 
arrangement of the dots, but these characteristics are not 
pertinent except in so far as they affect the symmetry of the 
die, and their effect in this respect is unknown. Hence it 
must be concluded that each of the six faces is “ equally 
likely” to appear. Of this complete group of six faces only 
one is an ace. Hence, by definition, the chance of an ace 
appearing is ¢- | 


EXAMPLE The letters of the word tailor are written on cards. 
The cards having been thoroughly shuffled, four are drawn in order. 
What ts the probability that the result is oral ? 


The number of permutations of six distinct things four at 
LW el oh . ; : 
a time is = = 360. These permutations differ in no known 


pertinent respect except their identity. Hence they form a 
complete group of equally likely events. Only one of these 
events, however, is the word “‘oral.’’ Hence the answer to 
the question is 34>. 


EXAMPLE 10.—The letters of the word pepper are written on cards. 
The cards having been thoroughly shuffled, four are drawn in order. 
W hat 1s the chance that the result is peep ? 


In this case a distinction must be made between cards 
and J/etters. The permutations of the cards are all equally 
likely. The permutations of the Jetters need not be. The 
permutations of the cards form a complete group of 360 events 
of which a certain subgroup gives the desired word. The 
problem is to find the number of permutations in the subgroup. 


§ 19. UNCONDITIONAL PROBABILITIES 41 


Any permutation of the three cards bearing the p’s leaves the 
result unchanged so far as letters are concerned. The same is 
true of the e’s. Of course, every permutation of the p’s can 
be combined with every permutation of the e’s so that the total 
number of permutations which leave the group “ peep” 
unchanged is P? P? = 6X 2-= 12. The desired subgroup 


of permutations therefore numbers 12, and the answer is 
jy ee | 


360 30° 

ExampLe 11.—First Appearance of the Psychic Research Problem. 
— A spiritualistic medium claims to be able to tell the color of a playing 
card without seeing it. In order to test her claims an experiment is 
conducted with four red and four black cards. These cards are thoroughly 
shuffled and placed face down on the table. The medium is told that 
there are four red and four black cards, but presumably knows nothing 
as to their arrangement. The experimenter picks up a card and without 
either looking at it himself or showing it to the medium asks its color. 
If she answers “red,” he places it at one side of the table. If she answers 
“black,” he places it on the other side of the table. This process is 
repeated until all cards are exhausted. 

If the medium does not have the ability which she claims to possess, 
what 1s the chance that there will be just one black card in the pile that 
should be red? 


’ 


The order in which the medium will call her “‘ reds” and 
“ blacks ”’ is, of course, unknown; but if she has no power of 
detecting the nature of the cards, the order of calling will be 
quite independent of that in which the cards actually appear. 
Her chance of success would, in fact, be just the same if she 


- 


1 This example, which recurs in various forms, has the following interesting history: 
A certain pseudo-scientific hoax, of which the problem is a disguised formulation, was 
under investigation by a friend of mine. He was anxious to formulate the number 
of ‘‘reds” and “blacks,” and other features of the experimental procedure, so as to 
make the chance of an accidental high score, and particularly of ambiguous scores 
which the “medium” would undoubtedly regard as favorable, as small as possible. 
It was necessary to have a reasonable proportion of “reds”; and the number of “cards” 
which the “medium” would consent to handle was limited. We finally arrived at a 
set-up which, while none too satisfactory, was the best we could get; and with fear 
and trepidation my friend conducted the experiment. The outcome justified his fears — 
one of the least probable results occurred. The “medium” got the lowest possible 


score. 


42 PROBABILITY AND ITS ENGINEERING USES 


= 


were to state the order in which she expected the cards to 
appear before they were dealt—that is, if she claimed the 
gift of “ prophecy ” instead of “occult understanding.” Sup- 
pose, then, that we think of this order in which she calls the 
cards, whatever it is, as a ‘“‘ standard order,” and ask for the 
chance that the cards are so dealt as to match this standard 
order to just the extent prescribed by the statement of the 
problem. 

It is asserted that the medium knows that there are just 
four red and four black cards. It may therefore be accepted 
as a fact that she will call red and black exactly four times each; 
that is, the standard order contains four reds and four blacks. 

Now turning attention to the order in which the cards 
appear, it is at once obvious that the P44 = 70 possible per- 
mutations constitute a complete group of equally likely and 
mutually exclusive events. Among these there is a certain 
subgroup which matches the standard order in exactly three 
reds. If the number of events contained in this subgroup can 
be found, the problem will have been solved. 

This can be done by thinking of the process of laying down 
a permutation beside the standard order in such a way as to 
satisfy the conditions of the problem. To start with, a black 
card can be laid down in any one of the four red positions ~ 
of the standard order. After this has been done the remaining 
red positions must all be filled by red cards. Then the red 
card which remains can be placed in any one of the four positions | 
which are occupied by black cards in the standard order, after 
which the remaining positions can only be filled with black 
cards. There are therefore exactly 4-4 = 16 permutations 
which satisfy the roudiHons of fs problem. The answer to 
the problem is therefore 48 or 48. 


PROBLEMS 


1, A connector switch in a step-by-step exchange reaches ten 
subscribers on each of ten levels— oo subscribers in all. Every 
subscriber is represented on such a switch. What is the chance that 
Mr, A’s line appears on the third level? 


§ 20. CONDITIONAL PROBABILITIES 43 


2. Assume that the various operations of a connector switch con- 
sume time as follows: 


(gyeWitst Werticalstepy Poli. a ees Ou s sec, 
(2) Succeeding vertical steps......... O.10 sec. 
(ayobirst horizentalistep, .. 1.2... O..20 sec. 
(d) Succeeding horizontal steps....... ©.10 sec. 


An observer visiting the exchange watches a connector until it 
operates, and notes the operating time. What is the probability that 
it is less than 0.66 sec.? What is the probability that it lies between 
0.66 and 1.50? 


3. Mr. A makes two calls during an hour and is twice called by 
other subscribers, of whom one is Mr. B. The calling subscribers, 
in case they find Mr. A busy, repeat their calls. Each call occupies 
one of the 60 minutes of the hour. What is the chance that Mr. B 
is successful on his first attempt to call? 


4. In the psychic research experiment of Example 11, what is 
the chance of the medium scoring Ioo per cent. 


5. If, in the psychic research experiment of Example 11, six red 
cards and two black cards are used, what is the chance of the medium 
scoring 100 per cent? 


6. If, in the psychic research experiment of Example 11, six red 
cards and two black cards are used, what is the chance that the 
medium places just one wrong in each color? 


+. A milkman starts on his route with ten dozen quarts of fresh 
milk, together with five dozen quarts left over from the preceding day. 
Having delivered 100 quarts, he arrives at the home of Mrs. A, who 
receives one quart. What is the chance that it is stale? 


8. A batch of one thousand lamps is five per cent bad. If five 
are tested, what is the chance no defectives will appear? What is 
the chance the test batch will be forty per cent defective? 


§ 20. Conditional Probabilities 


The examples treated in the last section are stated with 
great finality. Sometimes, however, the statement of a ques- 
tion contains “ provisos”’ which very materially alter the 
probability desired. For instance, the question, What is the 


44 PROBABILITY AND ITS ENGINEERING USES 


probability that Christmas falls on Monday? has been found 
to have the answer 3. But the question, What is the prob- 
ability that Christmas falls on Monday if it does not fall on 
either Friday or Saturday? is also a proper subject for the 
theory of probability, and it is immediately obvious that the 
answer is no longer 4. 

The answer to a question of this kind is called a “ condi- 
tional ! probability,” and will be represented by the symbol 
P4(B), which may be read “ the probability. that B happens 
if 4 does.” Problems of this sort are sometimes exceedingly 
difficult to solve, but in many cases they are almost as simple 
as if the restrictive conditions had not been applied. This is 
true, for instance, in the case of the question asked above. If 
Christmas does not fall on either Friday or Saturday it must 
fall on one of the remaining days of the week, of which there 
are five. As these are equally likely, the answer to the problem 
is 4. 

Another example is the following: 


EXAMPLE 12.—If, in the experiment in psychic research described 
. in Example 11, the first card to appear 1s black but ts called red by the 
medium, what is the chance that at the end of the trial there will be 
exactly three red cards in the red positions ? 


The medium having called the first card red is left with 
three reds and four blacks. These she will call in some order, 
which is again our “ standard order.” 

As the first card dealt was black, the only way in which 
the remainder can match up in exactly three reds is for all 
of the remaining red positions in the standard order to be 
matched with red cards. If this is done there are left three 
blacks and one red with which to fill the four black positions. 
It is obvious that the red card can be placed in any one of the 
four positions, after which the placing of the three black cards 
gives no new arrangement. Thus it is seen that the total 
number of permutations in the subgroup which satisfies the 


1 Sometimes “contingent.” 


§ 20. CONDITIONAL PROBABILITIES 45 
conditions of the problem is 4. As the complete group con- 
sists of P34 = 35, the desired probability is 4. 

In other words, the medium’s chance of scoring 75 per cent 
is reduced by half if her first attempt is wrong. 

It is interesting to see how much greater her chances are 
if-she gets the first card right. Therefore the following 
example may be considered: 


ExampLe 13.—If, in the experiment in psychic research described in 
Example 11, the medium places the first card correctly, what is her chance 
of having exactly three correct cards in each position 2 


Note that the statement of this example is somewhat 
different from the statement of the preceding one. The pre- 
ceding example stated definitely that the first card inthe 
standard order was black, but was called red by the medium. 
As the problem is stated at present this is not true. All that 
is known is that the medium called the first card correctly. 
For the purposes of the problem, however, this makes no dif- 
ference, because of the fact that the two colors are equally 
numerous and equally likely. The apparent difficulty can 
be overcome by speaking, not of “‘ reds” and “ blacks,” but of 
“cards of the first color”? and “ cards of the second color,” 
meaning by cards of the first color, cards of that color which 
first appears in the standard order. 

The medium having first correctly called a card of the first 
color is left with three of the first color and four of the second, 
which she will call in a “‘ standard order.”’ If the cards appear 
in such a way as to misplace one card only of each color, a card 
of the second color must of necessity appear in some position 
which, in the standard order, is occupied by a card of the 
first color. As there are only three such positions remaining 
this can happen in just three ways. The remaining two posi- 
tions of the first color in the standard order must then be 
filled by cards of the first color, after which there are left one 
‘card of the first color and three of the second with which to 
fill the four positions that are occupied by cards of the second 
color in the standard order. As the card of the first color may 


46 PROBABILITY AND ITS ENGINEERING USES 


be placed 1 in either of the four positions, the total number of 
permutations favorable to the conditions of the problem is 
3°4 = 12. 

As for the total number of permutations in the complete 
group, this is easily a to be P37 = 35. (The answer to the 
problem is therefore 42 

If, therefore, ates ee guesses correctly at the first 
attempt, her chances of scoring exactly 75 per cent are increased 
by half. 

If the problem had been stated explicitly, as was done in 
the case of Example 12, the argument would have been just 
the same except that the first color and second color would 
have been either red and black, respectively, or black and red, 
respectively, according to the exact conditions laid down. 
As has been said above, the reason why it is possible to treat the 
case generally is that the two colors are equally likely and 
equally numerous. A further example can be added in which 
this is not true: 


EXAMPLE 14.—If, in the psychic research experiment explained in 
Example 11, there are six red and two black cards, and if the first card 
to appear ts red, but 1s called black by the medium, what is the chance 
that the black pile will contain only one red card ? 


The standard order is now introduced by a black card. 
The card that first appears, however, is red. There are there- 
fore left just P§ ; = 7 possible permutations, all of which are 
equally likely but not all of which match the standard order to 
the degree which is required by the conditions of the problem. 
They constitute the complete group required by the definition 
of ‘ probability.” 

After the first card has been disposed of, the standard order 
contains five blacks and two reds. In order to satisfy the 
conditions of the problem the one remaining red card must 
appear in one of the two red positions, after which the positions 
of the black cards are immaterial. There are therefore just 
two permutations which satisfy the conditions of the problem. 
They constitute the group the probability of which is desired. 


§ 20. PROBLEMS 47 


Before passing on to the next subject it should be noted 
that the method of solution used in these problems consists 
solely of the application of the definition of probability and 
the second general law of composition of events. Many 
problems of a much more complicated nature are capable of 
formal solution by the same means, though the actual numerical 
computation is often much more difficult. Success in treating 
problems of this sort is contingent on two essentials: a vivid 
and accurate mental picture of what is desired, and constant 
caution that the conditions regarding equal likelihood, mutual 
exclusiveness and independence are not violated. 

Finally, it must not be thought that there is only one way 
of obtaining the solution of such problems as these. That is 
rarely true in any mathematical work. In the present instance 
the answers to the different problems are inter-related in such 
a way that some of them can be obtained from others by 
simpler processes than those which we have here employed. 
A little later on, when the ideas which are needed for this 
purpose have been introduced, the problem will again be 
considered in order to show how this can be done. 


PROBLEMS 


In Problem 2, § 19, if the connector has not come to rest within 
0.66 second, what is the chance that it will come to rest within the 
next 0.33 second? 


2. In Problem 2, § 19, if the connector has not come to rest 
within 0.66 second after starting, what is the chance that it will come 
to rest within the following intervals after starting: 


I.0O-1 . §0, 
©.66-10.00, 
©.20-0.50? 


3. If, in the problem of Example 11, the first two cards drawn are 
black, but the medium calls one black and the other red, what is her 
chance of having a score of 75 per cent? 


4. If, in the psychic research experiment, six red and two black 
cards are used, and if the first card to appear is black, but is called 


, 


48 PROBABILITY AND ITS ENGINEERING USES 


red by the medium, what is the chance that she will score 75 per cent? 
Is the answer the same as in Example 14? Why? 


s. If, in the psychic research problem, six red and two black cards 
are used, and if the first card to appear is red and is correctly called, 
what is the chance of a 75 per cent score? 


6. The product of a lamp manufacturing concern has averaged 
5 per cent bad over an entire year. A small retailer gets a shipment of 
twenty cartons of five. It does not follow, of course, that he has 
received just five bad lamps, for his “sample” of the gross product 
may have been better or worse than the average. A customer pur- 
chases one carton. How much greater is his chance of having all good 
lamps if the shipment was I per cent better than the average, than it 
would have been if the shipment had been 1 per cent worse than the 
average? 


7. What is the chance of throwing an ace with an unsymmetrical 
die, if the ace is 5 per cent more likely than the adjacent sides, and 
the six 10 per cent less likely? 


§ 21. Compound Probabilities 


So far no cases have been considered which involved the 
simultaneous occurrence of more than one event. This will 
be the next subject of study. The general law which governs 
such cases reads as follows: 


The probability that event A occurs and is accompanied by 
event B is the product of the probability that A occurs by the 
conditional probability that if A occurs B likewise occurs. In 
symbolic form 

P(AB) = P(A)Pa(B). (20) 


1If 4 and B can occur simultaneously the law is true as stated. Sometimes there 
is a sequence of events as in the statement ‘“‘a man is shot and dies.” In this case 
“accompanied” must be replaced by “followed.”” When this is true the number given 
by this theorem represents the probability of 4 happening and being followed by B. 
The order cannot be reversed. 

For instance, there is a certain chance that a man will be shot. Jf he is shot there 
is a chance that he will die. The product of these is the probability that he will die 
from a gunshot wound. It is not the same as the chance that he ‘‘dies and is then 
shot” — the latter being a much less common occurrence. 

I would use the word “followed” except that it conveys a time-signification which 
is quite foreign to the subject. 


§ 21. COMPOUND PROBABILITIES gf 49 


This law is analogous to the second general law of com- 
position of events and is capable of general proof as we shall 
see in §47. For the present, however, we do not have the 
necessary equipment for such a general proof, and shall there- 
fore content ourselves with a special case. 

Let us begin with the consideration of a particular example 
exactly similar to Example 10: 


ExamPLe 15.—The letters of the word tooth are written on cards, 
which are then thoroughly shuffled. If two are drawn in order, what is 
the chance that they yield the word to ? 


The method of § 19 gives at once the answer 4, but the 
argument can be phrased in a slightly different form. To 
get to it is necessary to get, first t, then o. But the first 
draw results in one of five cards, and the second draw in one of 
the four which remain. The various possibilities may then be 
listed schematically thus: 


Ee re) O t* h 
Gee ete t Ost ie tio t htco" io” hh. t4o.0 tt: 


In all, there are 5-4 of them. Of these possibilities the starred 
ones lead to to. There are 2-2 of them. Hence the result 


iS es But this fraction naturally falls apart into (2)(%), 


the first term of which corresponds—not only in value but 
in the way in which it arrived as well—to the unconditional 
probability of drawing a t; while the second term likewise 
corresponds to the conditional probability of drawing an o 
if t was already drawn. Wence the example checks with the 
law. 

In general, if the “ event 7” is itself a subgroup of 7 events 
which forms part of a complete group of m; and if every 
member of the complete group has associated with it a com- 
plete group of m’ subsequent events, so that there are m of 
these “subsequent groups”; and finally if in each of these 
subsequent groups which are associated with “event 4” 
there are 7’ which produce “‘event B” —if all these statements 


so PROBABILITY AND ITS ENGINEERING USES 


are true the aggregate constitutes a total of mm’ compound 
events of which a total of un’ produce “4 followed by B.” 
The desired probability is therefore nz [mm = (n/m) (n'/m’), 
of which the two factors are, by definition, “‘ the unconditional 
probability of 4” and “ the conditional probability of B.” 
The theorem is of fundamental importance, and it must 
be repeated that our proof places restrictions upon it to which 
it is not, in fact, subject. For instance, if Example 15 read 


ExAmPLeE 16.—“ The letters t, 0, 0, t, h are written on cards, which 
are shuffled. One is drawn and the letter noted. If it is not a t, the t’s 
are sorted out and discarded. Then asecond shuffle and draw are carried 
out. What is the probability of t,o?” 


the possibilities would be 
te O O t= h 
of o* th .. 6 hy Oph. Bee; eo cepeeeonc. 


The events of the first group are no longer a// associated with 
equally numerous subsequent groups: but though this violates 
the conditions of the “ proof,” it obviously does not invalidate 
the theorem. In § 22 we shall be able to extend the proof to 
cover this case, though it will still not be perfectly general. 
Finally, it should be pointed out that the theorem can be 
extended to any number of consecutive events: the chance that 
all will occur in the prescribed order is the product of the 
proper set of probabilities. Some examples are given: 


ExampLe 17.—The letters of the word tailor are written on cards. 


The cards being first thoroughly shuffled, four are drawn in order. | 


What 1s the probability that the result 1s oral ? 


Of the six equally likely choices for the first letter only one 
is ano. Hence the chance of getting a group beginning with 
ois%. After this card has been drawn five remain. Hence if 
O is first drawn, the chance of drawing r nese is 4. Hence 
the chance of drawing o followed by r is 30: If o and r are 
first drawn the chance of ne drawing ais 4. Therefore the 
chance of drawing or a is z45. Finally, if these three letters 
have appeared, the chance that the next trial will produce 

rom 


Sioz. COMPOUND PROBABILITIES 51 


1 is 4. Thus the answer to the problem is y4g. Naturally 
this is the same result as was obtained in Example 9. 

The solution of the problem as it is here given has required 
the use of the unconditional probability of drawing an 0, 
together with three additional probabilities, each of which 
stands in the relationship of a conditional probability to all 
those which precede it. 


ExaMPLE 18.—The letters of the word pepper are written on 
cards. The cards having been thoroughly shuffled, four are drawn in 
order. What is the chance that the result is peep ? 


In this case the chance of first drawing a p is 8. There- 
after the chance of drawing an e is 2, and the chance of draw- 


ing another e 1. If all these appear the chance of drawing a 


p on the fourth attempt is 3. Thus the result sought is 


Beene Le 1 
6 6 4 3 30° 


ExamPLeE 19.—In the psychic research experiment of Example 11, 
what is the chance that the first card is correctly called, and that only 
one card of each color is incorrectly placed? 


The chance that the first card is correctly called is 4. If 
so, the conditional probability that three cards in each pile 
are correct is found from Example 13 to be 42. Therefore 
the answer to the problem is $-32 = 2%. 


Examp.e 20.—In the psychic research experiment of Example 11, 
what is the chance that the first card drawn is wrongly placed, and that 
all but two are correctly placed ? 


The chance that the first card is wrongly placed is evi- 
dently 4. If the first card was wrongly placed, the chance 


of having only two wrong is, by Example 12, 34. Hence the 
2 


solution of this problem is 4-45 = 3%. 

Examp.e 21.—Each of two groups of cards contains four reds and 
four blacks. One group, which we shall call Group I, is first thoroughly 
shuffled, and laid out face down. Group II is then shuffled and its 
cards dealt out on top of those of Group I. What is the chance that the 
first card dealt from each group is black, and that of the remaining pairs 
just five are matched colors ? 


o) PROBABILITY AND ITS ENGINEERING USES 

The chance that the first card of Group I is black is 3. 
If so, the chance that the card placed on it is black is 3. There- 
fore, the sess that the first card is black and matched by 
a black is 1. But if the first card is matched, the chance of 
having just two unmatched pairs is 32, as is easily seen by 
comparison with Example 13. Therefore the answer to the 


problem is $-3% = 35. 


PROBLEMS 


Using the time values of Problem 2, § 19, answer the following 
questions: 


1. If the observer notes the operating times for Io calls, what is 
the probability that all are less than 0.66 second? 


2. What is the probability that just 2 of the calls have less than 
0.66 second operating time? 


3. Referring to Example 17, if each card drawn is replaced and 
shuffled before another is drawn, what is the probability of oral. 
Has it been increased or decreased by replacing the cards? 


4. In Example 17 what is the chance of a result till if the cards 
are not replaced? If they are? Has replacing the cards increased or 
decreased the result? 


The next three problems refer to the experiment explained in | 


Example 21, except that it is assumed that six red and two black 
cards are used: 


5. What is the probability that the first card of Group I is black, 
the first from Group II red, and that there are just two unmatched 
pairs? 

6. What is the probability that the first cards are both red and 


that there are just two unmatched pairs? 


7. What is the probability that the first cards are red and black 
respectively, and that there are just two unmatched pairs? 


8. If in Example 21 the cards of Group I are not shuffled, but are 
laid out in order, reds being first; and if the cards of Group II are 
shuffled, what is the chance of j just six matched pairs? 


9. Under the conditions of Problem 8, what is the chance that the 
first pair are both red, and two of the remaining pairs unmatched? 


he 


§22. ALTERNATIVE COMPOUND PROBABILITIES Ge 


10. Under the conditions of Problem 8, what is the chance 
that the first pair are both black, and two of the remaining pairs 
unmatched? 


§ 22. Alternative Compound Probabilities 


Many problems, which resemble those of the last section in 
form, require the use of Convention II in their solution. 


EXAMPLE 22.—The numbers 1, 2, 3, 4, 5 are written on cards, of 
which two are drawn without replacement. What is the chance that the 
combination thus drawn is even 2? 


Obviously, what the first card is does not matter, and the 
second is as likely to be one as another; hence the answer must 
be 2. But suppose the problem is attempted by the argument 
of §21. The possible results appear schematically as follows: 


eee Oat. 4 5 
2* * I * Tuo a T o* ip ole * 
34-5 34° § 4° 5 3) 3 4 


Obviously avy number is admissible for the first choice, hence 
the unconditional probability of drawing an allowable first 
number is 1. But what is the conditional probability of a 
suitable second choice? Every first choice is associated with 
four possible second choices, of which sometimes one and some- 
times two are suitable. Obviously the situation is not covered 
by the proof given for the fundamental theorem on compound 
probabilities. 

On the other hand, it is easy to find the probability of 
‘an even number beginning with 1.” It is+-2 = 7). The 
same is true of “an even number beginning with 3,” or “with 
5”; while the probabilities for numbers beginning with 2 
and 4 are each 2.1 = 3. 

These things are mutually exclusive: therefore the chance 
of one or the other of the five happening is the sum of their 
separate probabilities, which is 2, and as an even number can 
result in no other way, this must be the desired probability. 
Naturally, it checks the result obtained directly. In general: 


‘ 


@\ oy 


54 PROBABILITY AND ITS ENGINEERING USES 


Tf an event can be expressed as the sum of a number of alter- 
native compound events, which compound events are mutually 
exclusive, and for which the separate probabilities can be found, 
the probability of the original event can be found by Convention II. 
It is essential that no manner of occurrence of the original event 
shall be overlooked. 


Now any such process of division of the event 4 must 
be made by introducing extraneous conditions not contem- 
plated in the original question. For instance, Example 22 
neither expresses nor implies any condition upon the first 
number drawn. That is evident from the way the solution 
was obtained in the first sentence after the statement of the 
example. Yet each of the separate probabilities used in the 
second method of solution is obtained by introducing one of 
these ‘“‘extraneous” events. If we-call the result of the 
first drawing 4, and “an even two-place number” B, the 
process appears symbolically as 


P(BY= > POD PilBy, (21) 


It is particularly evident, in this symbolic form, that the 
events 4, like the differential elements in an integral, are a 
sort of “catalytic agent,” introduced for the purpose of 
enabling our computation to be carried out, though not 
themselves a part of the result. It should also be noted that, 
though all the probabilities Ps(B) on the right-hand side are 
“conditional;” the P(B) to which they give rise is not.} 

As for the range of summation implied in the 4, it is always » 
allowable to have it cover the complete group of events 4; 
but if some P(B) is zero, the 4 to which it corresponds may 
be omitted without error. Thus, if Example 22 had asked 
for the probability of an “even number less than 30,” com- 
binations beginning with 3, 4 or 5 could have been omitted; 


? That is, not conditional in any way upon the set 4: the whole process may be 
predicated upon the desire to obtain a probability which is conditional with sSaP ec to 
some event or events not entering the present discussion. 


§22. ALTERNATIVE COMPOUND PROBABILITIES  ¢< 


they could also equally well have been included since the 
accompanying conditional probabilities would have been zero. 

Finally, a particular case which is of sufficient importance 
to merit special mention is that in which the extraneous group 
consists of an event 4 and its complement 4. Then the 
formula reads: 


P(B) = P(A) Pa(B) + P(A) P3(B). 


Thus in Example 22 the first number must be either even (4) 
or not even (4 ), the respective probabilities being 


As the conditional probabilities associated with them have 
already been said to be 


the solution of the example is 
PB) =24+8-2=% 
as before. 

EXAMPLE 23.—/[f the two groups of cards to which reference is made 
in Example 21 contain six red and two black cards each, what is the 
chance that the first pair is matched and the score is 75 per cent? 

In order that the conditions of this example may be ful- 
filled, the first card must be either red in both groups, or black 
in both groups. However, the chance that it is red in Group I 
is 3, and if so, the chance that it is red in Group II is # also. 
Thus the chance that it is red in both groups is 3%. On the 
other hand, the chance that it is black in Group I is 4, and if 
so, the chance that it is black in Group II also is 4, so that the 
chance of it being black in both groups is +5. 

These two events are mutually exclusive, and to that 
extent satisfy the conditions laid down upon the set of “ events 
A” in (21). They do not constitute a complete set: it 1s 
possible for the first card of Group I to be red, and the first 
of Group II black, or vice versa; but the conditional probability 
of event B (that is, of “ the first pair matched, and the score 


56 PROBABILITY AND ITS ENGINEERING USES 


75 per cent”) is then zero. Hence it is not necessary to eval- 
uate the probabilities of these remaining possibilities. 

As for the conditional probabilities of 75 per cent matching 
in the two cases which are not trivial—that is, the quantities 
represented by Pa(B) in (21)—these are easily found to be 
19 and %, respectively. Hence the answer to the problem 


oe SD tO ie G4 ao 
Is {5°34 + te°7 28° 


ExampLe 24.—If the two groups of cards to-which reference 1s 
made in Example 21 contain six red and two black cards each, what 
is the chance that the first pair fails to match and the score 1s 75 per cent? 


The chance that the first card is red in Group I is #, and 
if so, the chance that it is black in Group II is 4. The com- 
bination is one of two in which the cards of the first pair 
fail to match. Its probability is evidently 48;. The chance | 
that the first card of Group I is black and that of Group II 
red is likewise 33. These are the unconditional probabilities 
P(A). 

As for the conditional probabilities, the first is obviously 
identical with the solution of Example 14.1 It is 2. The 


1 As a matter of fact, so far as mathematical ideas are concerned, there is no differ- 
ence whatever between the “standard order” which the medium sets up in the psychic 
research example, and the order in which the cards of Group I appear in Example 21 
and those which follow it. But there is a psychological difference. For, if I am not 
mistaken, the medium who was confronted with the necessity of calling “red”’ six times 
and “black” but twice would be almost certain to say “‘red”’ the first time; so that 
we are not justified in saying that any one of her eight words is as likely as any other to 
be first, though we are justified in the assertion that any one of the eight cards of Group 
Tis as likely as any other to be first. 

Perhaps this fact may be used to emphasize the point about the axiom of equal 
likelihood to which we have already referred in § 3. There is a vast difference between . 
the assertions “Two events are equally likely when they differ in no known pertinent 
attribute” and “Two events are equally likely when we do not know what difference 
their pertinent attributes make.” The first statement leaves us with many situations 
to which the fundamental method of measuring probability cannot be applied—as in 
the present instance where, though I feel certain the psychological bias exists, I am 
wholly unable to state its extent. The second virtually says, ‘“‘Whenever it is 
impossible to measure the probabilities of a group of events, they are equally likely,” 
which is a sheer absurdity. 

It is the failure, on the one hand, to clearly express this idea, and on the other to 
clearly grasp it, which has led to much of the argument over the dogmas of “insuffi- 
cient reason” and “cogent reason” to which reference will again be made in § 48. 


$27. THE PSYCHIC RESEARCH PROBLEM Ese 


other is also found to be 2. The substitution of these values 
2 = 
in (21) leads to the result 33-2 + =8,-2 = 


3 
Zoi 


§ 23. Some Instructive Illustrations; The Psychic Research 
Problem 


Certain interesting relations exist between the probabilities 
which have been obtained in the case of the psychic research 
example. In the first place, confining our attention to the case 
of four red and four black cards, there is obviously no pertinent 
distinction between the four red positions in the standard 
order which would make a black card more likely to fall in 
one of them than another. Hence the unconditional prob- 
ability that the medium is wrong the first time she says “ red” 
must be the same as the unconditional probability that she is 
wrong the second time she says “red,” or the third, or the 
fourth. Furthermore, since the card that falls in either of 
these positions is just as likely to be black as red, this uncondi- 
tional probability is 4. All this is axiomatic. 

Likewise there is no distinction between the first occurrence 
of the word “red” and its subsequent occurrences which 
would cause the conditional probability of a 75 per cent score 
to be different if one were known to have been incorrectly 
called rather than another. That is, if by accident we hap- 
pened to observe that a card which the medium called “ red ”’ 
was actually black, it would not matter whether it was the 
first, or some subsequent occurrence of the word: the proba- 
bility of a 75 per cent score would in either case be 34, as 
obtained in Example 12. 

But if a 75 per cent score is to be obtained at all, one or 
the other of the red positions in the standard order must be 
filled by black cards. As the four compound events (“ first 
red position occupied by a black card and a 75 per cent score,” 
“second red position occupied by a black card and a 75 per 
cent score,” etc.) are mutually exclusive, it follows that the 
unconditional probability of a 75 per cent score is the sum 


of four compound probabilities, each equal to }-3i. This 


58 PROBABILITY AND ITS ENGINEERING USES 


works out to be 8, which is, of course, the same as the result 
already obtained in Example 11. 

Another result which can be obtained out of this same 
argument is the following: To say that the first card 1s cor- 
rectly called and the score is 75 per cent, is equivalent to 
saying that either the second, the third, or the fourth black 
card is incorrectly called, the rest being correct in each case. 
The probability of each of these alternatives, however, is 
4-4, = 2. Hence the probability that the first card 1s 
correctly called and the score is 75 per cent works out to be 
fe + 2 + 8 = xf; a result which has already been obtained. 

Another result may easily be obtained from the use of (20), 
if we regard “event 4” as meaning that the first card is 
correctly called, and ‘“‘event B” that a 75 per cent score 1s 
obtained. In this case, (20) states that the probability that 
the first card is called correctly and the score is 75 per cent 
is the product of the unconditional probability of calling the 
first card correctly — which is known to be $— by the condi- 
tional probability of obtaining such a score under these cir- 
cumstances, which we denote merely by Pa(B). Equating 
4P.4(B) to the result obtained in the last preceding para- 
graph, it is found that Pa(B) = 4%. This again is identical 
with the result of Example 13. 

Many more relationships of this sort can be built up 
between the numbers already obtained. These are sufficient, 
however, to illustrate to what extent the theorems developed 
in the last few sections are capable of simplifying the solution 
of problems of this sort. 


§ 24. Some Instructive Illustrations; A Generalization of the 
Psychic Research Problem 


After having used the psychic research experiment explained 
in Example 11 so profusely for illustrative purposes, it would 
be unnatural to pass it by finally without obtaining a solution 
of somewhat greater generality than the special cases already 
considered. Hence the following general case is given; __ 


§ 24. THE PSYCHIC RESEARCH PROBLEM 59 


EXAMPLE 25.—Each of two sequences contains m events of one kind 
and n of another. The sequences are placed in one-to-one correspondence 
by a method of choice which is not influenced by the characteristics that 
differentiate the two kinds. What is the chance that there just p pairs 
which fail to match ? 


Phrased in terms of “ cards” and ‘‘ medium ”’ it reads: 


ExamPLe 26.—If, in the psychic research experiment explained in 
_ Example 11, there are m red and n black cards, what is the chance of 
just p incorrect cards in each pile ? 


The simplest way to solve the problem is by means of the 
fundamental definition of probability—just as Example 11 
itself was solved. The order in which the medium calls the 
cards forms a “standard order,” while the order in which 
the cards actually appear may be any one of the P*;, possible 
permutations of m red and 7 black things. These represent 
a complete group of equally likely and mutually exclusive 
events. Certain of these permutations fail to match the 
standard order in exactly p of the m reds and exactly p of the 
n blacks, and thus form a subgroup which meets the conditions 
of the problem. Put in other words this says that those 
positions which are red in the standard order must be filled by 
some possible permutation of p blacks and m — p reds, while 
those which are black in the standard order must be filled by 
some possible permutation of p reds and x — p blacks. Each 
of the possible ways in which the standard red positions can 
be filled is capable of association with each of the possible 
ways in which the standard black positions can be filled. There- 
fore the total number of events in the subgroup is 


Prono’ Pa-vo 
The desired probability is therefore 


Js ls aa (2) ae 
- por.  \ pl | (m= pn — p)'(m +»)! 


P(p) = 


60 PROBABILITY AND ITS ENGINEERING USES 


This can also be written in the form 


(Giles 
P(p) = D p 


a form which is itself capable of logical interpretation. 


One of the many elegant relationships between binomial coeffi- 
cients can be obtained from (22). In a trial of the sort under consider- 
ation there must either be no incorrect cards in each pile, or one in- 
correct card in each pile, or two or three or some other number not ex- 


ceeding the smaller of the integers m and. This means that the sum ° 
morn 


(22) 


is equal to unity, or 


$a etn 


This equation consists on the left-hand side of a number of frac- 
tions, and on the right-hand side of the integer 1. The validity of the 
equation will remain unchanged if every term in both members is 
multiplied by the same factor. Suppose C7'*" is chosen as this 


factor: then the equation becomes 
morn 


SC Cee Caen 
p=0 


This equation says that if any two columns of Appendix III are 
chosen and corresponding entries multiplied by one another until the 
end of the shorter column is reached, the sum of all the partial 
products thus obtained will itself be a binomial coefficient. Moreover, 
it must be in the particular column in Appendix III, the heading of 
which is the sum of the headings of the two columns chosen, and in 
a row which is denoted by the same number as one of the column 
headings. 

For example, if the columns headed 3 and 8 are chosen, the entries 
(as far down as the end of the shorter column) and their partial prod- 
ucts are 


Product 


re . 
§ 24. THE PSYCHIC RESEARCH PROBLEM 61 


The sum of the two column headings being 3 + 8 = 11, the law 
expressed by (22) says that the sum of the partial products, 165, is 
equal to C3;' or C#?. By reference to Appendix III, it will be found 
that each of these numbers is actually 165. 

In stating the law the upper index on the sign of summation has 
been written as “7 or 7,” meaning thereby the smaller of these two 
integers. Suppose for the moment that the smaller integer is m. 
Then as soon as p exceeds m, the binomial coefficient C” vanishes. 
Hence, if the number of terms were extended beyond the limit fixed 
by the smaller integer, the result would be merely to add on a certain 
number of zeros. This, of course, would not in any way affect the 
sum. In other wrds, the upper limit of summatiot can be taken as 
anything at all, provided it be not less than the smaller of m and zn. 
Thus; 


DCs Gy One 
p= 


DG, ONG 
p= 


0 


DG Cg. Ge is, 
= 
are all equally valid. 

A word of caution should be added about one step in the process 
_ by which this result has been obtained. The transfer of C™t® from 
the left-hand side of the equation to the right-hand side was accom- 
plished by multiplying both sides by this quantity. In doing this the 
fact was stressed that each individual term on the left-hand side 
must of necessity be multiplied by the same number, otherwise the 
result would be incorrect. This means, so far as the shorthand 
notation is concerned, that the general term 


Cy Cy 


“Cm 


can be multiplied by anything whatever which does not vary with p; 
that is, which does not change from term to term. The factor in the 
denominator does not involve p and therefore could be taken outside 
the sign of summation by this process. But if the general term had 
been, for example, 
Co Co 


m+n > 
Co 


62 PROBABILITY AND ITS ENGINEERING USES 


it would have been quite improper to multiply each term by C7 *” 
and arrive at the result 


m or % fy 

™m™ n m+n 
YON GS eco 
p=0 


In fact, this last equation not only is not correct, but it cannot even 
be interpreted, a fact concerning which the student can easily satisfy 
himself by attempting to assign numerical values to the letters. 


§ 25. Some Instructive Illustrations; The Problem of Inde- 
pendent Trials 


A very fundamental class of compound events is that in 
which the chance of an event occurring is not in any way 
influenced by what has already occurred: that is, where 
every event is quite independent of the rest. Dice problems 
are of interest in the subject of probability largely because 
they typify this class of events. The following is a simple 
example: 


ExampLe 27.—What ts the chance of throwing an ace exactly once in 
six throws of a die? 


This problem can be solved by the use of alternative 
conditional probabilities. The possible cases are 


Taking the first case, the probability of throwing an ace 
on the first throw is 4. The probability of not throwing an 
ace on any of the remaining throws is (2)5, since the events 
are all independent. The probability of the first compound 
event is therefore 4(8)5. 

In the second case the answer is obtained in the form 
3(4)()4, which is exactly the same as before except for the 
order in which the fractions appear. 

The remaining alternative cases can be similarly treated, 
and in each instance the answer comes out the same. Since 
these cases are all mutually exclusive the chance that one or 


§ 25. INDEPENDENT TRIALS Spek 


the other occurs—which is what the problem asks for—is 
6-4-5 = (DS. 

Let us see what this problem really teaches. Since the 
throws are independent the chance of throwing one ace and 
five “not aces’”’ in some preassigned order is always 4(8)5, 
regardless of the order chosen. Hence it is necessary to add 
together as many of these equal quantities as there are distinct 
orders in which the result may appear. The number of 
distinct orders, however, is just the number of permutations 
of six things of which. five are alike 1 and one different, that 
1S P es. 

In general, if the probability of throwing exactly 7 aces in 
m throws had been asked for, there would be as many possible 
orders in which the result might occur as there are ways of 
permuting 7 things of one kind and m — n of another, that is, 
m'\/n\(m —n)!. This happens to be numerically equal to C? 
and therefore can be conveniently represented by that symbol. 
The solution of the general problem therefore consists of the 
sum of C,; terms, each corresponding to the probability of 
throwing m aces and m — n “not aces” in some preassigned 
order. Since the throws are independent each of these terms 
is the product of equal factors 4 and m — n other equal 
factors 3; that is, to (4)"(3)""". The answer is therefore 


cr ay pn. 
It is now a simple matter to extend this formula into the 
following general theorem: 


If the probability of an event occurring in a single trial ts p, 
the chance that it occurs exactly n times in m INDEPENDENT frials ts 


Pala Gaop eli Pang (23) 


This is one of the fundamental theorems of the Theory of 
Probability. It will receive full discussion later. 


1 That is, “not aces.” Their distinguishing characteristics are non-essential. 


& 


es 
= 
ea 
oo, 
> 
= 
Yr! 


64 PROBABILITY AND ITS ENGINEERING USES 


§ 26. Some Instructive Illustrations; A Generalization of the 
Problem of Independent Trials 


We can easily derive as a corollary to this theorem another 
which is frequently useful. Speaking again in terms of the 
dice problem which we have considered, but writing m1 where 
we previously wrote 7, we note that the m — m “ not-aces”’ 
are composed of deuces and other things. The chance that 
there are just 72 deuces among them is just 


Ca IE, he 


since the m— m trials are certainly independent and the 
probability of a deuce appearing is 4 when ace is known not 
to appear. We must not overlook the fact that what we 
obtain in this manner is the conditional probability of ze 
deuces in m trials if there are known to be just m aces. Hence 
we conclude that the chance of just 71 aces and m2 deuces is 
the product of this expression by the unconditional probability 
of m1 aces. 

If we go another step we find as the conditional probability 
of m3 threes in m trials, if there are known to be just m aces 
and nz deuces, is 


Ot 1)" Cae 


and by multplying this factor in with the two already found 
we can obtain the probability of exactly m aces, m2 deuces 
and 73 threes. 

Carrying this process on step by step we eventually find 
the probability of exactly 7 aces, m2 deuces,..., 6 sixes. 
After common factors have been cancelled out the result is 


Pr(t1, 25. . +, M6) = Pairng...ine (GY G)™.- - Q)™ 


it being understood, of course, that m1 + m2 +...+76 =m. 

This, too, is easily converted into a general theorem, though 
the fact that the six faces of the die are equally likely makes 
it rather difficult to guess what it is to be. For that reason, 
and also because the alternative form of proof is interesting 


§ 27. AN URN PROBLEM 65 


in itself, we sketch a proof of the general case by a different 
line of argument. 

If we have a complete and mutually exclusive set of events 
the individual probabilities of which are pi,... hp mand ad, 
we make m independent trials, the chance that the first event 


occurs just 7; times, the second 7» times, and so on in a specified 
order is just 


Pi? Po m= Pry 


no matter what order may be specified. Hence to get the 
chance of the first event occurring m times, the second 12 
times, and so on, regardless of order, it is only necessary to 
multiply this quantity by the number of possible permutations. 
This gives us immediately the theorem: 


If the events denoted by the subscripts 1, 2,..., 5 are mutually 
exclusive and form a completé set, and if their respective prob- 
abilities of occurrence are Pi, P2,.++ 5 Psy the chance that they 
will occur with the frequencies m1, N2,...,N,inm=nm+...+%, 
independent trials 15 

Pn(m, M2 +++ 4 ns) a fae ae a Pi pz 


(24) 


If there are just two events this formula reduces to (23); 
while if there are six equally likely events it reduces to the 
result obtained in our consideration of the tossing of a die. 


§ 27. Some Instructive Illustrations; A Typical Urn Problem 


In dealing with the last example, considerable emphasis 
was placed upon the fact that the trials were independent of 
one another. That a different result is obtained if this condi- 
tion is not satisfied may be illustrated by the following example: 


Examp_e 28.—An urn contains five red and ten black balls. Eight 
of these are drawn out and placed in another urn. What ts the chance 
that the latter then contains two red and six black balls ? 


This example resembles the former one in that it might 
be very simply stated as, What is the chance of drawing 


66 PROBABILITY AND ITS ENGINEERING USES 


exactly two red balls in eight trials? It differs from the 
former in that the trials are not independent; that is, the 
chance of drawing a red ball on the first attempt is 33;, while 
the chance of drawing a red ball on the second attempt 1s 
either 54; or 53; according as the first ball was red or black. 
The solution to the problem, however, is not hard to 
obtain. There are P?} = C3 orders in which the two red 
and six black balls may appear. If each of these orders is 
separately considered the probability of drawing the balls in 
exactly this order may be found. The sum of all these terms 
will then be the desired answer. This appears to require the 
computation of C; = 28 terms—which would involve a con- 
siderable amount of labor. Fortunately, however, the con- 
sideration of a very few terms serves to show their law of 
formation and makes the complete solution quite simple. 
Three of the possible 28 orders are obviously 


Consider the first of these. The chance of choosing a red 
ball first is 35. If this is done, the chance of choosing a red 
the next time is 34;. If both these events have taken place, the 
chances that the next six balls are each black are! 19, 5%, 
3p i> $ %, respectively. Therefore the chance that all of 
these events take place is 


eT ae se AW OUP vedo A ii 
164 13 12 1f bo 9 


ie 
5, 
If the second case is considered the answer takes the form 


BL et: eet Le er ay tote asta ge 
15 8 


14 


while if the third group is considered the result is 


LO 02 Sow. eG 
15 9°8 


Now the remarkable thing about these three expressions 
is, that although the separate fractions differ, the product is 
Sie REISE IDS SHE TSE EE SS AE! ee go 


* Note that these are all conditional probabilities, and are not equal one to another 
as in the case of independent trials. 


§ 27. AN URN PROBLEM 67 


the same for every case. As a matter of fact, this property 
is common to all the terms. Every time a ball is chosen, 
whether red or black, the number of balls remaining in the 
urn is reduced by one so that the denominator of the next 
conditional probability is also reduced by one. This means 
that no matter what the order of choice may be the array 
of factors in the denominator will always be the same. Like- 
wise whenever a red ball is chosen the numerator for the next 
conditional probability of a red ball appearing is reduced by 
one. But the numerator of the next conditional probability 
for a black choice remains unchanged. Thus to the red 
choices correspond fractions the numerators of which are 
always 5 and 4, while to the black choices correspond numer- 
ators 10, 9, 8, 7,6, and 5. As the permutation of these factors 
does not affect the magnitude of their product, it follows 
that every order of choice has the same probability as every 
other. In other words all orders are equally likely. The 
answer to the problem is therefore 


Peyote yh ae ve) 

ma 4l 151) 429 

This same answer can also be obtained in another way. 
Suppose the balls are all tagged to establish their identity. 
There are then P}° ways in which eight can be drawn out, 
all of these ways being equally likely.t' In order to solve the 
problem by means of the fundamental definition of probability 
it is only necessary to determine how many of these ways 
represent just two red balls and six black balls. This number 
may be found by observing that there are C} ways of choosing 
two red balls for the group and Cj’ ways of.choosing six 
black balls. This makes a total of C}.C;° different combina- 
tions of tagged balls which satisfy the condition of the problem. 


1 This may be taken, either as an intuitive fact, or as the result of computation; 
for if an order is specified, the chance of drawing in just that order is 


1 


Leese 
SDiow 14s les. 6S), 


whatever the order may have been. : 


68 PROBABILITY AND ITS ENGINEERING USES 


Each of these, however, is capable in itself of Ps permutations. 
Thus, the total number of permutations satisfying the condi- 
tions of the problem is found to be C} C;° Ps, and the answer to 
the problem is obtained as 

CoCr 


15 
8 


This is easily reduced to the form already found. 

Just as in the preceding section, it is possible to generalize 
this solution and obtain a working theorem. Suppose the 
problem had read: 

An urn contains m red and n black balls. If p + q are drawn in 


order and placed in another urn, what is the chance that the latter contains 
just p red balls and q black ones? 


If the balls were all tagged there would be P?/? equally 
likely permutations. Any trial would give some one of these. 
There would also be a total of C? C7 combinations of p red 
and qg black balls each of which would be capable within itself 
of P?if permutations. Therefore the total number of per- 
mutations which satisfy the conditions of the problem and 
therefore compose the desired subgroup is C} C7? P32. Divid- 
ing the number of elements in the subgroup by the number 
of elements in the complete group it is found that the prob- 
ability of drawing exactly p reds and g blacks is 
- myn PPts mn 
Pasa, Qi= att = ies 
P pra Cr+ 


In words this theorem reads: 


If a group of m things of one kind and n things of another 
exists, and if this group is reduced by eliminating one thing ata 
time, the thing being chosen quite without respect to those charac- 
teristics which differentiate kind from kind, the probability that 
the first p + q stages will remove p things of the first kind and. 
q of the second is 
CO Ct 


m+n * 
Co+¢ 


Peps q) a (25) 


§ 28. ANOTHER URN PROBLEM 69 


A useful relation among the binomial coefficients can be 
obtained from (25) by an argument exactly analogous to that 
used in § 24. Suppose we denote the total number of with- 
drawals by r, so that we have the relationship p + g=r. 
Among the r things withdrawn there must be, either none of 
the first kind and r of the second, or one of the first and r — 1 
of the second, or some other of the set of obvious possibilities. 
This leads us at once to the equation 


» Pan(Dst — P) =e Ls 
p=0 


or, upon substituting (25) in this equation and noting that 
the denominator is the same for every term, 


pO Ret OES Tan (26) 


p=0 


We shall find this formula of service in a later section. 


§ 28. Some Instructive Illustrations; Another Typical Urn 
Problem 


Suppose the example considered in the last section is 
modified to read: 


ExamPLe 29.—An urn contains five red and ten black balls. Eight 
times in succession @ ball is drawn out but it is replaced before the next 
drawing takes place. What is the probability that the balls drawn 
were red on two occasions and black on six ? 


Since the balls are replaced before the next drawing takes 
place the condition of the urn is always the same just before 
every trial, and therefore the chance of drawing a red ball 
or a black ball is the same for each of the trials. In other 
words, the trials are completely independent. The theorem 
developed in § 24 therefore applies to this case. 

The chance of drawing a red ball is 4 and the chance of 
drawing a black ball 2. Hence the chance of drawing exactly 
two reds and six blacks in eight trials is x 


C2)" )* = eet: 


70 PROBABILITY AND ITS ENGINEERING USES 


This answer is considerably smaller than that obtained 
when the balls were not replaced. Decimally, the answer to 
the present problem is 0.273 while the answer to the other 
was 0.326. 


§ 29. Some Instructive Illustrations; A Problem in Matching 


A problem of a greater difficulty, which needs repeated use 
of alternative compound probabilities for its solution, is the 
following: 


Examp.Le 30.—From a thoroughly shuffled pack of cards m are dealt 
and laid in a row face down in the order of appearance. This pack is 
then laid aside. Another pack is taken and m cards are dealt on top 
of the first m. In this way m pairs of cards are obtained. The cards 
in any pair may be either of like or of unlike.color. What 1s the chance 
that there are exactly n matched pairs? 


This example bears an obvious resemblance to Examples 
21 and 26, but differs from them in a very important respect 
which makes it much more difficult to solve. In the previous 
problems the number of red and black cards was both known 
and known to be equal in both sequences, whereas in the 
present case the number of reds and blacks dealt from the 
first pack is neither known nor known to be equal to the 
number dealt from the second pack. 

As the number of reds dealt from the first pack is unknown, 
the natural thing to do is to assign a letter 7 to represent it. 
So far as is known r may have any value between zero and m. 

Next we denote by P(r) the chance of the assumed number 
of reds being the true one; and by P,(z) the chance of 2 
matched pairs if itis. Then 


Pin) = » 2 P(r) P-(”), (27) 
P(n) being the symbol chosen for the answer to the problem. 


This is a use of (21). 
Now obviously P(r) i is a special case of formula (25), the 


§ 29. A PROBLEM IN MATCHING 71 


p and gq of (25) being r and m — 1, respectively, while both 
m and 7 are 26: that is, 
26 26 
Ore “ae | (28) 
Hence P(7) can be found if P,(7) can be found, so one of the 
indefinite features of the problem has been removed. The 
next step will eliminate another. . 

If there are exactly 7 matched pairs, there must be either 7 
pairs of red cards and no black pairs, or 7 — 1 red pairs and 
one black pair, or some other combination of numbers having 
the sum 7. If, then, P,(k, 2 — k) is the probability of just 
k red and m — k black pairs, when there are r reds in the 
bottom row, 


Peni ye Pisa ee => P.(ksn—k). (29) 


The final step is to find P,(k, 2 — k). This is done by an 
argument exactly like that of § 22. 

The order in which the bottom row of cards appears is 
unknown; but whatever that order is, it may be called the 
“standard order.” On the other hand, the top row is some 
possible permutation of m cards chosen from the §2 cards of 
the deck. If the separate identities of all these cards are taken 
into account, the total number of permutations of this sort 
is P? = 52!/(52 — m)!. These permutations are all equally 
likely, they are mutually exclusive, and they form a complete 
set. Therefore the desired probability P,(k,2 — k) can be 
found by finding the subgroup which matches the standard 
order in exactly & reds and x — k& blacks. To find this number, 
it is easiest to find the number of possible combinations, and 
then the number of permutations of which each combination 
is capable within iself. The product is the desired number 
of permutations. 

The & red cards which match reds in the standard order 
can be chosen in C;° ways, without taking account of order. 
With ‘them may be combined any one of C;*, combinations 
of blacks. Then, from the remaining 26 — r+ & blacks the 


n —k which match can be chosen in C7°;-“ ways; while the 


42 PROBABILITY AND ITS ENGINEERING USES 


(m — r) — (n — k) reds which fall on blacks can be chosen in 
oF _a_» Ways from among the 26 — & reds which remain. 


(m—T) — 


The total number of combinations of cards which may be dealt 
in the top row is Ce 


26 26 — sig 26—Ek 
Cy pGas Cm_n-@-H- 


Those cards which ute in the red positions of the standard 
order can be permuted among themselves in-every possible way 
without in any way affecting the number of matches. The 
same is obviously also true of the cards which stand in the 
black positions of thestandard order. The number of permu- 
tations of the first kind is P,’” = r! and the number of permu- 
tations of the second kind is P™~’ = (m —r)!. When the 
product of four C’s written oe is multiplied by these two 
P’s it gives the desired subgroup. Hence: 


The probability of matching the standard order in exactly 
k reds and n — k blacks if the standard order contains exactly 
r reds and m — r blacks is}: 


26— ee 26—E m—Tr 
GE (oe EGa C —(n—k) petal gaan ik 


(m—r) 
52 
ies 


1This formula has been worked out without the slightest reference to the limits 
within which the numbers & and r — & must be confined. That there are such limits 
is obvious. For instance, the number of red pairs cannot be negative, which means 
that & must be greater than or equal to zero. Similarly, the number of black pairs 
cannot be negative, which means that 2 —k 20 or k Sn. On the other hand the 
number of red pairs cannot exceed the number of red cards in the standard order; 
that is, k Sr, and the number of black pairs cannot exceed the number of blacks in 
the standard order, which:means thatz —k S$ m—r. If any one of these four con- 
ditions is violated P,(k, 2 — k) must be zero. ; 

We have already noted some cases where formule of this sort automatically took 
the correct value zero when the sensible limits upon the variable quantity were trans- 
gressed. It is interesting to note that (30) actually vanishes when any of the above 
conditions is violated. For instance, for k <0, C2?® vanishes. If k>r, C76, 
vanishes. If k > 7, Ca “T+ vanishes. If (1 —k) > (m—nr), Caen ae 
vanishes. 

If it were not for this fact, enh we come to substitute the values of P,(k, n—k) 
in (29) it would be necessary to use formula (30) only for those values of & which lie 
within what we have called sensible limits. However, since (30) actually takes the 
correct value zero when these limits are transgressed, we may if we desire substitute 
it algebraically into (29) without worrying about this point. This fact simplifies the 
notation in equations (31), (32) and (33). 


P.(k,n —b) = ees) 


§ 30. AN EXAMPLE IN COMPUTATION 73 


All that now remains is to collect and simplify the results. 
First, note that 


BOE ee 
re Cae 
Then substituting this in (30), and (30) in (29), gives 
BG 5 Gy ee Cnn eh-s 
E=0 oe: 


_ As the denominator does not vary with k, it may be taken 
outside the sign of summation, giving 


(31) 


I n 
P,(n) = om on CES AOE teat Riel RP) 
Finally, substituting (32) and (28) in (27), and factoring out 
those terms which do not depend upon 1, gives 
Beat) Ge Cnegt eft ane | 
ie CS (oy a, Cr Ca Cu en Cmten-r (33) 


This is the desired answer. 


§ 30. Some Instructive Illustrations; An Example in Compu- 
tation 


The formula (33) is quite complicated even when written in short- 
hand notation. To write it without the use of signs of summation 
would be next to impossible. Although the illustration itself has 
no great importance beyond its value as an example of a method of 
attack which must frequently be resorted to in complicated cases, 
it is probably wise to carry out the numerical computation of a special 
case in order, in the first place, to impress more clearly what the 
notation means, and in the second place, to show how it almost 
automatically builds up a scheme of numerical computation. 

To this end we choose the following special case of Example 30: 


Example 31.—/f eight cards are dealt from each of two packs as 
explained in Example 30, what is the chance that there are exactly four 
matched pairs ? 


The solution of this special case is, of course, given by substituting 
the values m = 8 and 2 = 4 in (33). There then remain two letters 


74 PROBABILITY AND ITS ENGINEERING USES 


r and k in the formula. According to the limits on the signs of 
summation, 7 is to take every value from o to 8 and & is to take every 
value fromo to 4. In order to accommodate the numbers dependent 
upon these two variables it is necessary to have a table, the columns 
of which are headed 0, 1, 2, 3 and 4 to correspond to the values of 
k and the rows of which are numbered from o to 8 to correspond 
to the values of r. As a matter of fact more than one such table is 
required before the computation is complete. 


TABLE III 

CoMPUTATION OF ce Ca 
r k=o0 k=1 k=2 k=3 k=4 
° I 
I 26 26 
2 B25 676 325 
3 2,600 8,450 8,450 2,600 
4 14,950 67,600 — 105,625 67,600 14,950 
5 65,780 388,700 845,000 845,000 388,700 
6 230,230 4,858,750 6,760,000 4,858,750 
7 657,800 38,870,000 38,870,000 
8 1,562,275 223,502,500 


The first of these tables (Table IIT) is devoted to the computation 
of the probability 


26 726 
Ci Gs Ee 


When k is zero the product reduces to C7®;_ therefore the first column 
in the table is obtained by merely copying from Appendix III, taking 
account of the fact, of course, that Co° is 1. The second column is 
the product of the constant factor C ts by the variable factors C?* ,’ 
But C7° is the second entry in the first column, while the variable 
factors are themselves the successive entries in the same column, 
except that they are displaced one unit because the subscript is 
ry — Linstead ofr. This means that each integer in the first column 
is multiplied by 26, and the product recorded, not opposite the 
integer multiplied, but one space lower down. Similarly the third 
column is obtained by multiplying each element of the first column 
by 325 and recording the results two places below the row from 
which they were obtained. The other columns are computed in 
asimilar manner. Only five entries have been placed in each column, 


§ 30. AN EXAMPLE IN COMPUTATION a5 


since what we have called the “‘ sensible limits ”’ upon & show that 
no other entries are needed. 


The next step is the computation of C25’+*. This is carried out 


TABLE IV 
ComPUTATION OF Cue 

r k=o | k=1 k=2 k=3 k=4 
fs) 14,950 
I 12,650 2,600 
2 10,626 2,300 325 
3 8,855 2,024 300 26 
4 7,315 1,771 276 25 I 
5 1,540 253 24 I 
6 231 23 I 
7 22 I 
8 I 


in Table IV, and consists merely of entering in each column a number 
taken from Appendix III. 

The third step is the computation of C7; ,. This again consists 
merely of writing down numbers taken from Appendix III. The 


TABLE V 


ComPurTATION OF asnat an 
P; k=o kaa k=2 k= k= 4 
Oo 14,950 
. 2,600 12,650 10,626 
2 B25 25300 10,626 
3 26 300 2,024 8,855 
- I 25 276 1,771 75315 
5 I 24 253 1,540 
6 I 23 231 
i I 22 
5 I 


result is given in Table V. It will be noted that the numbers omitted 
at the bottom of each column, like those omitted at the tops of the 
columns of Table III, are allzero. It is useless, therefore, to compute 
the factors by which these zeros would be multiplied in evaluating 


76 PROBABILITY AND ITS ENGINEERING USES 


(33). This fact accounts for the remaining empty spaces in the 
tables. : 

It is next necessary to multiply together the corresponding entries 
in these three tables. The results are given in Table VI. 

We are now ready to carry out the summation of these terms with 
respect to & as indicated in (33). This means summing up the 
entries in each row of Table VI; the results of which are recorded 
in the sixth column, under the heading 2x. 


TABLE VI 
CoMPUTATION OF ce Cos Ch aAee 

r k=o 1 k=2 [23 kA Zk 

° Bi ORV foph: D2 31503% 
I 8.55140 8.551408 1.710289 
2 1129372 3.57604° 1.12237? 5.82078 
3 5.985988 | 5.13084 5.13084 5.985988 1.145889 
4 1.09359 2.99299 8.04609 2.992999 | 1.093598 | 1.42508 
5 5.98598% | 5.13084 5.13084 5.98598 1.14589 
6 Tel 22¢7 3.57604 TerI2g 7 5.820789 
7 8.551408 8.551408 1.71028 
8 2.23503 DIB 503° 


At this stage of the computation & disappears; that is, the numbers 
denoted by 2; depend upon r only. Therefore the remainder of the 
computation can be carried out in a single table. 

The first column in this table (Table VII) contains the values of 
C7®, The next column contains the, products C7° C§°,, which are 
obtained by multiplying together the first and last entries in the 
first column, the second from the top and the second from the bottom, 
the third from the top and the third from the bottom, and soon. It 
is obvious that this column is symmetrical about the center of the 
column. Hence it is only necessary to write the first five numbers: 

The third column contains C?. It is now necessary to multiply 
Z, as given in Table VI by the corresponding number in the second 
column of Table VII, and divide it by the corresponding entry in the 
third column of Table VII. In using a modern computing machine 
it is easier to carry out both of these operations without writing down 
the intermediate result than otherwise. Therefore, the next column 
in Table VII contains the result of both operations, that is, 


GC Cet 
Chee 


§ 31. ANOTHER URN PROBLEM 


As =, and C8 also are symmetrical about the middle of the column, 
all these computations are carried out for the top five entries only. 

Finally (33) requires the addition of the numbers in this last column 
or every value of r from o to 8. The sum thus obtained is written 
at the bottom of the column. P(z) is then obtained simply by 
dividing this sum 2, by the square of C3?’ as taken from Appendix 
III. This completes the solution of the problem. 


TABLE VII 


Finat CompuTaTION OF THE DEsIRED PROBABILITY 


r Cr EM Reale Cc Goes so: 
° I 1,5623° I ReAg yes 
I 26 1.71037 8 3.65631 
2 325 7.48257 : 28 1.555538 
3 2,600 1.71038 56 3.4996'% 
4 14,950 2.23508 7° 4.5501" 
5 65,780 
6 230,230 
a 657,800 
8 1,562,275 
Zr = 1.54611" 
(C87)? = 5.663117 


Answer P(4) = 0.27302. 


§ 31. Some Instructive Illustrations; Another Urn Problem 


An urn problem of somewhat different type from those 
which have been previously considered is the following: 


ExamPLe 32.—An urn contains m black and n white balls. These 
are drawn out one at a time and placed in a separate container. The 
drawing is continued until all those balls which remain are of the same 
color. What is the chance that they are all black? 


Suppose the problem were modified by requiring the drawing 
. to continue until only one ball remained. If this were done, 
what would be obtained would be some one of the possible 
permutations of m black and a white things, the remaining 
ball being the last member of the permutation. The number 


78 PROBABILITY AND ITS ENGINEERING USES 


of such permutations in which the last ball is black is obviously 
equal to 


Fe gt 
and as the permutations are all equally likely to appear the 
chance that the last remaining ball is black is 
Pees m 


ii = 
ve Le + ie 


This same result could have been obtained by noting that any 
of the m + 7 balls is equally likely to be left as any other one, 
and m of them are black. 

This is also the answer to the problem as originally stated, 
a fact which becomes evident upon noting that whenever the 
last ball on this sort of drawing is black the group of balls 
which remain after the contents of the urn has been so far 
reduced as to be all of one color must of necessity be black. 
On the contrary, if the last ball happens to be white, the 
residual contents of the urn must likewise be white. 


§ 32. Some Instructive Illustrations; Another Urn Problem 


The solution of the following problem—like that of the 
last—involves no great difficulty if good judgment is used in 
the choice of the method of solution, but it might be very hard 
otherwise. 


EXAMPLE | 33.—A4n urn contains m white balls and n red balls, 
m being greater than n. These are drawn out one at a time and placed 
in a second container. The drawing is continued until all the balls 
have been transferred. What is the chance that throughout this process 
there are always more white than red balls in the second container? 


It is obvious that any drawing will result in some one of 
the P7;, possible permutations of m white and 7 red balls. 
These permutations are all equally likely and form the com- . 
plete group necessary for the application of the fundamental 
definition of probability. 


To find the number of those permutations which satisfy 


§ 32. ANOTHER URN PROBLEM 79 


the condition that there are always more white than red balls 
in the second container is the next step in the solution. This 
can best be done by finding the number of permutations 
which do not satisfy this condition and subtracting that from 
the original number. 

Any permutation which does not satisfy the conditions of 
the problem must belong to one of two classes: the class in 
which the first ball drawn is red, or the class in which the 
first ball drawn is white. These two classes can be considered 
separately. 

Obviously, every permutation beginning with a red ball 
violates the condition of the problem; because, after the first 
ball is drawn, there are more red than white balls in the second 
container. The number of such permutations, however, is 
simply the number of ways in which the remaining 7 — 1 reds 
and m whites may appear; that is, Pini. 

Some of the permutations which begin with white balls 
satisfy the conditions of the problem and some do not. At 
some stage of the process those which do not must have either 
the same number or a greater number of red than white balls 
in the second container. If there is a greater number, then 
at some subsequent stage the number must be equal, because 
when the drawing is complete the number of white balls 
exceeds the number of red. Therefore, finding the number of 
permutations which at some time have an equal number of 
red and white balls is the same as finding the permutations 
which violate the conditions of the problem. 

This is most easily done by an artifice. The permutations 
under discussion are of the general type 


wwrwerer|rwi|[|wri|www. 


At the stage in the drawing indicated by each of the lines there’ 
are the same number of red and white balls in the second 
container. This may occur more than once, as in the permu- 
tation given, but however many times the numbers may 
become equal there must always be a last time. Suppose, 
now, that for purposes of argument a new permutation 1s 


80 PROBABILITY AND ITS ENGINEERING USES 


built up which, after the last equality is reached, is identical 
with the one under consideration, but before the last equality 
is reached has red balls where the white ones were and white 
ones where the red ones were. In the case of the above illus- 
trated permutation, for instance, this would lead to 


rrwrwwi|wri[rw|www. 


This again is a possible permutation of m white and 7 red balls. 
Also it begins with a red ball. Hence, to every permutation 
which begins with a white ball and violates the conditions of 
the problem there corresponds by this inversion process a 
permutation which begins with a red ball. Furthermore, 
whenever the permutation beginning with a white ball is 
changed a different permutation beginning with a red ball is 
obtained. From this it follows at once that the number of 
permutations which begin with white balls and violate the 
conditions of the problem cannot exceed the number of per- 
mutations which begin with red balls. This can be expressed 
symbolically by saying that 


Dubs De 


Every permutation which begins with a red ball must 
somewhere have an equal number of reds and whites, since 
after the drawing is completed there are more whites than reds. 
Therefore, the process which was carried out above can be 
inverted, leading to. the conclusion: For every permutation 
which begins with a red ball there is a permutation beginning 
with a white ball and violating the conditions of the problem. 
This means, of course, that the number of permutations 
beginning with red balls cannot exceed the number of permu- 
tations beginning with white balls and violating the conditions 
of the problem; or in symbols, 


“ Pr S Pw: 
When this inequality is compared with the one above it is 
found that both can be satisfied only when p, = pw. But p, 
has already been found to be Pi; hence it follows at once 


§ 32. PROBLEMS 81 


that the total number of permutations which violate the 
conditions of the problem is 


2 Pes ek 


mn—-1°* 


Those which remain form the subgroup which is required 
by the fundamental definition of probability. Their number is 


(Post an BS Deets 


m,n m,n—1°* 


Therefore the answer to the problem is obtained in the form 
m,n m,n—-1 

Pe 2 wind an 

Mm,” i i 

aes m+n 
The last two examples have been given to illustrate the 
extent to which it is sometimes necessary to augment the 
routine processes of probability theory by common-sense 
methods in obtaining solutions of comparatively simple 

problems. 


PROBLEMS 


1. Make use of the results of Problems 5, 6 and 7 of § 21 to obtain 
the solution of Example 24. 


2. In the first paragraph of § 23 it is stated that the unconditional 
probability of any black card being incorrectly calledis 3. Since there 
are four black cards the chance that either the first or the second or 
the third or the fourth is incorrectly called works out to be 
$+4+44-++4=2. What is wrong with this argument? 


3. If p +’s and 2 —’s are distributed at random along a line, 
wnat is the chance that no two —’s are adjacent? 
Tr 
4. By the use of (25) prove that & COGS Bas 


p=0 
5. Show that Convention IT, which led to (21), makes possible the 
solution of Example 16. 


REFERENCES FOR OurTsIDE READING 


. CootiwceE: Probability, pp. 13-25. 

. Curystat: Algebra, pp. 571-586. 

TopuunTER: Algebra, pp. 447-461. 

. Bacuetier: Calcul des Probabilités, pp. 1-8. 

. Czuser: Wahrscheinlichkeitsrechnung, Vol. 1, pp. 28-72. 


mAPwWDYH 


CHAM LY: 
ProgpaBILITY AND EXPERIMENT; BERNOULLI’S THEOREM 


§ 33. Introductory Remarks 


The reader who has dealt at all with statistics will already 
have remarked that nothing whatever has been said about the 
frequency with which an event will happen. It has been said 
that in tossing a penny heads is “ equally likely ” to appear 
as tails; never that “ heads will appear as often as tails.” The 
reason is, that the latter statement isnot true. Try it and see. 
It is not even true ‘“‘in the long run,” unless that phrase is 
given a rather unusual shade of meaning which is dangerously 
near begging the question. 

Neither is it true that “in a large number of independent 
runs, heads will as often exceed tails as tails will exceed heads’; 
for that is merely shifting attention from the event “a single 
throw ” to the event “a run,” and weak logic is never made 
stronger by obfuscation. Again the answer is, “Try it and 
see.” 

If these things were true, probability would be an experi- 
mental science, which I am quite convinced it is not. It is 
true that the outcome of an experiment may change the 
probability that a penny is bad—any accretion of pertinent 
information does that. If a large number of throws showed 
twice as many tails as heads, and we were carrying the weak 
end of a bet, we would probably insist on changing the penny. 
It would be more probable, because of the experiment, that the 
penny was loaded than it was before the experiment was 
performed. We may even have undertaken the experiment 
with a view to finding out whether the penny is good or bad; 
and if bad, how bad. But if so, in a strict logical sense, we 

82 


Sey WHAT IS A LIMIT? 83 
will never find out. Not only will we never find what the 
probability of tails is (that is, how dad the penny is); we can 
never even answer with finality the question, Is it bad?; for 
our result, no matter how one-sided it may be, would not 
have been impossible with a good penny. We will be led to a 
presumption that the penny is bad, and even to a presumption 
that tails 1s approximately so-and-so much more likely than 
heads; but to nothing more. 

Why these things are true—that is, why we cannot deter- 
mine the magnitude of a probability from experiment, and why 
we can nevertheless use experiment as a practical means for 
approximate evaluation of probabilities—is, broadly speaking, 
the subject of this chapter. Naturally the answer can best be 
given after suitable foundations have been laid for it. So 
with the question in our minds we will proceed to the founda- 
tion-building. For this purpose certain rather elementary, 
but perhaps not very familiar, mathematical ideas must be 
introduced. 


§ 34. Limits and Things which Approach Them 


Let us take, as a very simple example, the function y = x?, 
or its graph (Fig. 4). We say that “ y approaches the limit 
zero as « approaches zero.” 
Just what do these words 
mean? 

Do they mean that y 
is always zero? Certainly 
not; y is the ordinate to 
the curve. Do they mean 
y is ever zero? I suppose, 
since y is zero when x is 
zero, that the temptation 
may be to answer this ques- ) Fic. 4. 
tion affirmatively: if it is . 
not, so much the better. But we will suppose that tempta- 
tion to exist, for the sake of overcoming it by another 


example—a trite one. 


S 


8, PROBABILITY AND ITS ENGINEERING USES 


A man has a thread a foot long. He cuts off half of it. 
Then he cuts off half the remainder, and so on. What he has 
left after successive cuts then gives us the sequence 


1 
2) 4) 8) 16) 32) 


Again, his remnant “ approaches zero as time goes on.” Is 
it ever zero? Obviously not. 

There is, then, a real difference between the thing which 
approaches a limit, and the limit which it approaches. ‘here 
may be special circumstances when the two merge, as in our 
first example, or there may not, as in our second. 

It may even happen, that at just the point where we 
expect the variable to reach its limit 
it misbehaves and takes a different 
value. Consider, for example, Fig. 
5, in which a function of » is repre- 
sented which obviously approaches 
the limit + 1 as » approaches zero 
from the left, and — 1 as it ap- 
proaches zero from the right. It is 
not quite apparent what to expect it to do atx =o. -But 
such a function can be represented by an infinite series of sine. 
functions—a Fourier Series—and when it is so represented we 
learn the surprising fact that at « = o the function takes the 
value zero—a value which it certainly does not approach from 
either side.! 

There is a further distinction between our examples: Figs. 
4 and § define y for any value of x: for x = 0.1, for example. 
But in the thread illustration ‘“‘ the length af the remnant 
after one-tenth of a cut has been made” is nonsense. Calling 
N the number of cuts already made, and L the length of 
the remnant, this illustration would lead to a graph like that 
of Fig. 6, with ordinates at every integer point along the 


Fic. 5. 


1 This result is a consequence of using the Fourier series, not an inherent property 
of two-line segments that fail to meet. The point is, not that the function must 
so behave, but that it may. 


§ 34. WHAT IS A LIMIT? 85 


N-axis, and nothing in between. This is expressed by saying 
that the values of y in Figs. 4 and 5 constitute a “ continuum,” 
and those of Fig. 6 a “ discrete 
set.” Obviously either kind can 
approach a limit; and either kind 
can fail to reach that limit. 

It happens that we are to be 
interested primarily in discrete 
sets: hence we phrase our defini- 
tion of “ approaching a limit” in | 
the way most appropriate to that Fic. 6. 
use. It is as follows: 


If an ordered set of numbers is of such a nature that, having 
chosen in advance a number ¢ as small as we like, but not zero, 
and having cancelled some finite number of numbers from the set, 
we can assert that no two of the remainder differ by as much as «, 
the set has a limit. 


Let us take, for example, the set 
I, $, 4, ..; 
If a particular value of « is chosen, there is some power of 
1 smaller than «. Thus, if € is 0.000001, ($)?°< « But 
whatever ¢ is, some power N can be found such that (4)” < «, 
this NV being a finite number. If we cancel the first NV terms 
of our set, no two of the remainder differ by as much as e. 
Hence the set approaches a limit. 
There is a defect to this definition: it does not tell us what 
“ the limit ” is. It merely tells what we mean by approaching 
alimit. This defect can be removed in one or the other of two 
ways: 
(a) If the set contains a number g which need not be can- 
celled no matter how small ¢ is made, ¢ is the limit. 
(2) If to the set a new number g is added, and it is found 
that g need not be cancelled no matter how small e¢ is made, 
q is the limit. 
Obviously, in the first case the set ‘contains its limit or, in 


86 PROBABILITY AND ITS ENGINEERING USES 


technical language “is closed’; while in the second case the 
limit is not contained in the set, or the set “is open.” 

Our example is a case of an “‘ open set”; for if we try any 
member of the set, say, (4)”, we can choose ¢ so small that it 
must be excluded: for example, « = ($)"** would do. But 
if we arbitrarily include zero in the set, zero need never be 
cancelled. Hence our set is an “open set” with the “ limit 
zero” which it “ does not contain.” 


§ 35. The Upper Bound of a Set 


Among a jimite set of numbers, one possesses the property 
of being at least as big as any other. It is the “largest.” 
Sometimes there are several which possess this property. If 
so there are several “ largest ’’ numbers, as in the set I, 2,7, 7, 7. 

This statement is only safe of finite sets, however. Among 
an infinity of numbers there need not be any largest number. 
There are two ways in which this rather unexpected result can 
come about. They can be very simply illustrated by the 
following examples: 

There is no largest integer among the infinity of positive 
integers, for no matter what integer may be chosen there is 
always one larger than this. This is a case where the numbers 
contained in the set do not have any limit as regards size. 
That is, they are not “ bounded.” 

The set of fractions 4, 2, 3,..., all of which are of the 
form m/(m + 1), does not contain a largest fraction; for if it 
did, this fraction could be obtained by giving m some particular 
value M. Whatever M may be, however, m = M+ 1 always 
results in a larger fraction: that is, (MZ + 1)/(M + 2) is bigger 
than M/(M + 1). Hence there is no largest fraction. This 
is a case where the set of numbers is actually ‘‘ bounded”; 
there is no number bigger than 1. If, however, the integer 1 
is included in the set, itis the biggest number. The set is 
now “closed ” whereas before it was “open.” In either case 
1 is called the “least upper bound ” of the set, the only dif- 
ference being that the closed set contains its least upper bound 
(which is also the limit of the sequence); the open set does not. 


§ 35. THE UPPER BOUND OF A SET | 87 


Among numbers which represent probabilities there can 
be no unbounded sets,-for such numbers are inherently con- 
tained between zero and one. But there may be either open 
or closed sets. In fact, the sequence 4, 2, 3,...,is capable 
of interpretation in terms of probability. If an infinity of 
urns are provided each containing one white ball, and if in 
one of these urns is placed one black ball, in another two 
black balls, in a third three black balls and so on, the chance 
that a ball drawn at random from one of the urns is black is 
either 4 or 3 or $ or m/(m + 1), according to the urn from 
which the ball is drawn. It is obvious from an intuitive 
standpoint that there is no urn in the set for which the prob- 
ability is as great or greater than for all the remaining urns. 
hevsetrise open: =.” 

If, however, an additional urn is provided with black balls 
only, ¢his will have the probability 1, which is actually greater 
than for any other urn. The set is now “ closed.” 

In either of these cases 1 1s called the “ least upper bound ” 
of the probability. In the second case it is reached (by 
the urn with only black balls), in the first case it is not. 

It is a general theorem regarding sets of numbers that a 
closed set always contains its least upper bound (which is 
then a synonym for “largest number”), while an open set 
may, but need not. An example of an open set which does 
. not has already been given. An open set which does—and 
which can easily be interpreted in the sense of probability—is 
1, 4, 2, 8, 4,..., which does not contain its limiting value 4, 
though it does contain a number 1 greater than any other 
number in it. 

Now it happens that almost all statistical studies are 
postulated upon open sets which approach their upper bound 
as a limit but do not contain it, and a great deal of logical 
confusion has arisen from loose thinking about them. Hence it 
is essential to obtain a clear picture of the significance of the 
upper bound in such cases. 

As a step in this direction, take the set of events already 
mentioned: the drawing of a black ball from the various urns 


88 PROBABILITY AND ITS ENGINEERING USES 


each having just one white ball. The set of numbers is 
4,2, 8; . 5, and! its upper pound (which is not reached) 
is 1. Can it be asserted that, by choosing wisely the urn 
from which we draw, we can be assured of obtaining a black 
ball? Itcannot. So long as the urn contains a white ball—as 
all the urns do—the outcome is unknown. The most we can 
say is, that by passing from urn to urn the importance of our 
ignorance can be made less and less. The limiting condition 
is certainty, but that limit cannot be reached. 

Choose your urn: make it as far along in the sequence as you 
like. I'll take the first one. We draw. We want black, but 
neither is certain to get it. The only difference lies in the 
importance of our states of ignorance; yours is of less moment 
than mine. 

“But,” you say, “if we draw repeatedly, my superiority will 
manifest itself.” I have already said weak logic cannot be 
strengthened by obfuscation; but I'll follow you this once 
into what, if we kept on, would soon become an infinite regres- 
sion. Name your number of trials. Make it as big as you 
like. You cannot be sure you will not draw white every time, 
and I black. But we'll agree it is extremely improbable. 

What I am aiming to make clear is that “ extremely improb- 
able” is as far as we can go; and that ¢hat is not arrived at by 
trial, but by judgment in advance. A trial can only tell us 
what has happened; our intelligence can go further than that - 
and say whether it was miracle or not. 

§ 36. Regarding Probability as a Limit 

There are certain students of the subject who define the 
probability of a head appearing when a penny is tossed, as the 
limit of the ratio of the number of heads to the number of 
throws, as the number of throws is increased indefinitely. 
Such a definition, however, implies as a fundamental postulate. 
that the ratio obtained in this way actually approaches a limit. 
If this were so, such a definition would be logically possible, 
and I think would have considerable superiority over the one 
which we have set up; for we have already called attention 


§ 36. PROBABILITY AS A LIMIT 89 
to the fact that there are many situations to which our defini- 
tion cannot be applied;~whereas the limit definition could be 
applied equally well! to almost anysituation. The trouble is, 
that the fundamental postulate is only tenable provided the 
trials are not independent,? whereas the definition implies, even 
when it does not explicitly state, that the trials are inde- 
pendent. 

To see how this inconsistency arises, let us consider the 
matter of tossing pennies, and let 4, denote the number of 
heads observed in 7 tosses. We form, in particular, the 
sequence of ratios, 

he het hehe hn 

Te oe oe Aa 6 ie 
stretching out toward infinity as throw after throw is made. 
The definition which we are criticizing affirms that this se- 
quence approaches a limit. But we can only properly make 
such an affirmation provided, after a finite number of terms 
have been eliminated, we are assured that no two of the 
remainder differ by more than some pre-assigned quantity e. 
In fact, it must be possible to make this assertion for avy e, 
no matter how small; but for our purposes it is quite suffi- 
cient to consider only the value ¢« = 4. 

Let the last of the terms which are to be eliminated be 
the one corresponding to 2 = N—1, N being, of course, 
any number we please. Then, if a limit exists, it must be 
possible to assert that o two of the terms which remain differ 
by asmuchas «. If there is any pair about which this assertion 
cannot be made, it is impossible to say that the sequence 
approaches a limit. 

Consider, now, the particular terms corresponding to 


1 And also equally badly. In theoretical discussions it would always be applicable, 
and in practice never, for we can never cause our number of trials to “ approach 
infinity” in a practical sense. This difficulty is a superficial one, however. The 
utility of experiment is exactly the same, no matter which point of view we start from. 


2 Unless “ approaches a limit ” means something logically different from its usual 
mathematical definition. 


90 PROBABILITY AND ITS ENGINEERING USES 


n= N and n= 2N. If Aw/N exceeds 4, and if it should 
happen to be true that every throw from N + 1 to 2N yielded 
a tail, it would be true that A2v = Ay. Hence 


hy how 1 hn I 
ss _ ee es 
NNN TCR 2aN GY eet s 


Obviously it cannot be asserted that this will not happen, 
unless the result of a ale throw depends on what has 
gone before. 

On the other hand, if hy /N does not exceed 3, and if the 
throws from N + 1 to 2N should all yield heads, it would be 
true that fay = hy + N, whence 


hon An I I An 


et = €} 


Yi ae Oe 


and it cannot be asserted that this will not happen, either. 
It is therefore impossible to say that the sequence has a limit. 

The trouble lies, fundamentally, in the fact that the set 
of probabilities corresponding to a run of one head, two heads, 
three heads, and so on, though it has the limit zero, never 
reaches that limit. If it ever did reach it and remained there 
thereafter, the ratio of heads to throws would also have a 
limit. This fact would then be a theorem, not a postulate. 
But so long as the throws are independent, the next throw 
after a long run of heads may also be a head: no length of run 
is impossible, and the ratio weed not (though it very probably 
will) approach a limit. 

Or, if we prefer to look at it from a slightly different point 
of view, the trouble lies in the fact that the two positive 
assertions, “‘ The trials are independent,” and “ The sequence 
approaches a limit,” are inconsistent, and cannot be made the 
basis for a definition. 

What can be done is this: an a priori definition of probability 
being allowed, it can be proved that the pRroBasitity of any two 
terms differing by more than ¢ can be made as small as desired 


§ 37. REPEATED TRIALS 91 


by taking N large enough. It is the a priori probability, not 
the sequence of experimental ratios, which has a limit. This will 
become more apparent in the course of the next few sections. 


§ 37. Regarding Repeated Independent Trials 


Consider an urn in which three balls are placed: one white 
and two black. Suppose repeated drawings are made from 
this urn, the balls being returned after each. According to 
(23), § 25, the chance of just n white balls in m trials is 


P,(n) = C2 OP". 


Suppose now that five trials are made in succession. Either 
one of six things may happen: None of the trials may give a 
white ball, or one of them, or two, or three, or four, or five. 
The probability of each of these six results can be computed, 
the results being given in the accompanying Table VIII. 

In this case there are two “most probable” results. One 
white ball or two white balls are equally likely to appear, 
and either is more probable than any other possible result. 


TABLE. VIII 


Tue PRoBABILITY OF 7 SUCCESSES IN Five TRIALS 
IF THE PROBABILITY IN A SINGLE TRIAL IS $ 


, 
n | Probability n Probability n Probability 


° 0.1317 2 0.3292 4 0.0412 
I “0.3292 Ks 0.1646 5 0.0041 


If ten trials are made instead of five there are eleven pos- 
sible results. The probabilities of these eleven results individ- 
ually are given in Table IX. In this case there is one “ most 
probable’ number of white balls, the number being three. 
The least probable number of white balls is, of course, ten, 
for which the value given in the table is 0.0000. This does 
not mean that ten white balls could never appear in succes- 


92 PROBABILITY AND ITS ENGINEERING USES 


sion. It only means that the chance is so small as to be 
negligible in the fourth decimal place (that is, less than one-half 
of 0.0001). The exact probability is 1/59,049 = 0.0000169. 


TABLE 1X 


Tue PRoBABILITy OF 7 SuCCESSES IN TEN TRIALS 
IF THE PROBABILITY OF SUCCESS IN A SINGLE TRIAL IS 3 


n Probability n Probability n Probability 
fo) 0.0173 4 0.2276 8 0.0030 

I 0.0867 5 0.1367 9 ©.0003 

2 0.1951 6 0.0569 10 ©.0000 

3 0.2601 ae 0.0163 


If fifty trials are made the results are as shown in Table X. 
Again there are two equally likely results, 16 and 17, each of 
which is more probable than any other. 


TABLE X 


THE PROBABILITY OF 7 SUCCESSES IN Firry TRIALS 


IF THE PROBABILITY OF SUCCESS IN A SINGLE TRIAL Is 4 


n Probability n Probability n Probability 
| 
<a 0.0000 13 0.0679 23 0.0202 
~ 14 0.0898 24 0.0113 
5 ©.0001 15 ©.1077 25 0.0059 
6 | 0.0004. 16 o.1178 26 0.0028 
7 0.0012 ~ 17 0.1178 27 © \O/0012 
8 © .0033 18 ©. 1080 28 ©.0005 
9 0.0077 19 0.0910 29 0.0002 
10 0.0157 20 0.0704. 30 ©. 0001 
II 0.0287 21 0.0503 
12 0.0470 22 0.0332 > 30 ©.0000 


The first and last entries mean, not that these cases cannot 
occur, but that their chances of occurrence are less than one 
in 20,000. As a matter of fact there is a finite probability for 


§ 38. WHAT REALLY HAPPENS 93 


drawing a white ball in every one of the fifty trials, but it is 
exceedingly small. In fact it is! 


I il 


3°° 717,898 ,000,000,000,000,000,000" 


§ 38. The Limiting Condition as the Number of Trials is Greatly 
Increased 


The probabilities in Tables VIII to X are plotted as 
ordinates in Fig. 7. 

In this form of graphical presentation several facts stand 
out at once: 


1. There is an orderly progression from one graph to the 
next. 


2. The most probable number of successes increases pro- 
gressively; that is, the greater the number of trials the greater 
the most probable number of successes. 


3. The probability of this most probable number of successes 
decreases progressively; that is, the greater the number of trials 
the less the chance of coming out with the most probable result. 


4. The number of different results which have probabilities 
comparable to that of the most probable result increases pro- 
gressively. This result can be expressed quite simply by say- 
ing: the greater the number of trials, the greater the spread of 
the chart. In other words: the probability of missing the most 
probable result by more than a stated amount increases con- 
stantly as the number of trials is increased. 


This last point is worthy of some further discussion. 

Suppose, for example, that we ask for the chance of missing 
the most probable result by more than five units. If only 
five trials are made it is impossible to miss the most probable 
result by more than five units. In this case the probability 


1 Numbers such as this are inconceivable. I suggest, as an entertaining exercise 
of the imagination which is worthwhile just once, the computation of the dimensions 
of a container which would hold 3°° beads. Having done this, if just one were black 
and the rest all white, and if the bunch were thoroughly mixed, the chance of drawing 
the black one would be the fraction in question. “Too small to matter,’ you say: 
yet the chance of getting the one you did get was no greater! 


94 PROBABILITY AND ITS ENGINEERING USES 


asked for is zero. On the other hand, if ten trials are made, 
the probability of missing the most probable result by more 
than five units, though quite small, is a finite value. It is? 
0.0003. In the case of fifty trials the chance of missing the 


0.3 
0.2 


0.1 


0.0 


0 10 20 30 40 Rd 


Fic. 7.—Tue Propasitity or # SuccESSES IN m TRIALS IF THE PROBABILITY 
or Success IN A SINGLE TRIAL Is 4 


most probable result by more than five is 0.13. Moreover, 
this probability can be made as large as we please by taking 


? This figure is obtained as follows: In Table IX the most probable value of 7 is 3. 
To miss it by more than 5 would require that ” be less than zero, which is impossible, 
or more than 8, for which the age 0.0003 is obtained by adding together the 
last two entries in the table. 

The figures for 50, 100 and 1000 ae are obtained in a similar way. 


§ 38. WHAT REALLY HAPPENS 95 


the number of trials large enough. Thus, for too trials it is 
0.29, for 1000 trials 0.74; and for 1,000,000 trials about 0.99. 

Though these figures are based upon missing the most prob- 
able result by jive units, the same qualitative facts apply to 
missing it by any preassigned number of units. 


In an infinity of trials the probability approaches certainty, 
that the observed result will differ from the most probable result 
by more than any preassigned number, no matter how large. 


It is this spreading out of results which is responsible for 
the tendency toward decreasing probability of the most prob- 
able result. Since one or the other of the various results is 
bound to happen, the sum of the ordinates on each of the 
graphs must of necessity equal unity. As the number of 
ordinates of comparable length increases, the magnitude of 
each must be necessarily decreased. 

Put in still another form this statement becomes: The 
probability of missing the most probable result by more than 
five is the sum of all the ordinates which lie outside a band 
width of five on each side of the highest ordinate. Since this 
sum increases progressively as the number of trials is increased, 
it follows that the sum of all the ordinates within the band 
must correspondingly decrease. 


5. The most probable number of successes is always approx- 
imately one-third the number of trials. 


This is illustrated in the figure by a short vertical line 
drawn near the top of the curve for that value of 2 which 
corresponds to m/3. In the case of five trials m/3 is 3 and 
lies between 1 and2. These are the two most probable results. 
In the case of ten trials, m/3 is 1,2 and lies between 3 and 4. 
The most probable number of successes is 3. For fifty trials 
the line comes at 52 and it was found that 16 and 17 were 
each equally likely. For 100 the line comes at 13% and 33 
is the most probable number of successes. 

Since the probability of success in a single trial is 3, this 
suggests the rule that the most probable number of successes 


96 PROBABILITY AND ITS ENGINEERING USES 


in m trials is mp, where p is the probability of success in a 
single trial, provided mp is an integer; otherwise the most 
probable number of successes is one or the other of the integers 
between which mp lies. Although it is not always safe to 
generalize particular cases in this fashion, it happens in the 
present instance that the result 1s correct. 


0.3 
0.2 me=S 


0.1 


0.0 


0.1 m= 50 


0.0 


0 10 20 30 40 


Fic. 8.—An ALtrernative Form or Fic. 7. 


There is one more thing which it is worth while to do with 
Fig. 7. Since the ordinates which represent the probabilities 
occur at unit intervals while intermediate values of 7 have no 
significance, it is possible to erect a rectangle of unit. width 
upon each ordinate, thus producing a set of rectangles the 


ae 


§ 39. BERNOULLI’S THEOREM 97 


areas of which are equal to the probabilities of the values of 7 
upon which they stand. “” 

When so treated the diagram takes the form shown in Fig. 
8, in which the graphs corresponding to five trials and ten 
trials are drawn exactly as explained. The other graphs differ 
from these only in the fact that the vertical sides of the rect- 
angles, which contribute nothing to the interpretation of the 
graphs, are omitted, leaving only a broken line. This broken 
line has two unique properties. One is the property from 
which it was derived—that the area under each step is the 
probability of the corresponding value of ». The other 
property comes from the fact that the set of values of 7 is 
complete; it is that the area under the entire broken curve is 
unity. 

A curve constructed in this fashion is called a “ distribution 
curve ”’ for the variable x. 


§ 39. Bernoulli's Theorem 


The fact that the most probable number of successes in 
Figs. 7 and 8 is just about 4 the number of trials suggests 
plotting the set of curves once more, using, however, /m 
instead of 7 as abscissa. In doing this we shall obtain a sort 
of distribution curve for the proportion of successes, or as we 
shall more often call it, a “ percentage distribution curve.” 
But in order that the word “ distribution curve ” may have the 
same meaning as before, the two unique properties to which 
reference was made in § 38 will be conserved. That is, the 
areas of the rectangles will be kept constant, no matter what 
happens to the ordinates. Naturally, since the curve for 
m = 100, say, will be condensed more laterally than will the 
curve for m = Io, it will also be stretched more vertically, 
with the result that the curves will be differently related than 
before. 

The reconstructed family is shown in Fig. 9 for m = 50, 
100 and 1000.!_ Again there is a marked progressive tendency 


1 For m = 1000 the steps are so small that they cannot be shown. 


98 PROBABILITY AND ITS ENGINEERING USES 


among the curves, but the laws of progression are no longer 
the same as before. They may now be stated as: 


1. The most probable proportion of successes remains 
approximately the same as m increases. 


0.1 0.2 ; : 0.5 0.6 


Fic. 9.—An ALTERNATIVE Form oF Fic. 7. 


2. This most probable proportion is always as near to p as 
it can be, considering the fact that 2 must be an integer. 


3. The height of the rectangle which represents this most 
probable proportion increases as the number of trials increases. 


4. The spread of the percentage distribution curve decreases 
as the number of trials is increased. That is, although the chance 


a 


§ 39. BERNOULLI’S THEOREM 99 


of missing the most probable value of 2 by more than a pre- 
assigned amount gets greater and greater as the number of trials 
increases—a fact to which attention was called in the last sec- 
tion—the chance of missing this most probable value of 7 by a 
given percentage decreases continually. 


It is easy to see that the chance of »/m differing from 4 
by less than a preassigned amount « is represented in Fig. g 
by the area bounded laterally by a pair of vertical lines! 
e units to each side of 4. In the figure « is taken as 0.04. 
With only 50 trials moré than half the area lies outside these 
limits. That 1s, the proportion of successes is more likely 
to lie outside the limits 0.293 and 0.373 than inside them. 
In the case of 100 trials the area outside the boundary is con- 
siderably reduced, and the proportion of successes is more 
likely to lie between the prescribed limits than not. In the 
case of 1000 trials almost the entire area lies between these 
limits: there is very little probability that in so extensive an 
experiment the proportion of successes would differ from 4 
by as much as 0.04. By increasing m still further, a point 
would eventually be reached where the chance of the propor- 
tion of successes lying outside the prescribed range would be 
smaller than any arbitrarily fixed quantity. In other words, 
as m approaches infinity, the chance of 7/m lying outside the 
limits p — «, p + ¢ approaches zero, and the chance that it 
lies within these limits approaches certainty. 

This tendency has been illustrated by the use of « = 0.04 
and p = 4, but the conclusion reached would be the same no 
matter what values were chosen. If, for example, ¢ were 
taken as 0.000001, the chance of /m lying within p — «p+ e 
would be extremely small for a hundred trials or even a 
thousand trials; but it would get larger and larger as the 
number of trials was increased until with some very large 
number of trials, say a million million, it would be almost 
certainty. This fact can be expressed in the form of a theorem 
as follows: 


1Of course those “steps ’’ which lie partly within this band and partly outside 
it are either to be entirely included, or else entirely excluded, according as their mid- 
points lie within or without the band. 


1coo PROBABILITY AND ITS ENGINEERING USES 


BeRNOULLI’S THEoREM: Jf the chance of an event 
occurring upon a single trial is p, and if a number of inde- 
pendent trials are made, the probability that the ratio of 
the number of successes to the number of trials differs from 
p by less than any preassigned quantity, however small, can 
be made as near certainty as may be desired by taking the 
number of trials sufficiently large. 


Sometimes the content of a theorem such as this is made 
clearer by throwing mathematical discretion to the winds and 
stating it in the form of every-day language. The present 
appears to be a case of this sort, and therefore we restate the 
theorem as follows: 

If the probability of an event is p, and if an infinity of trials 
are made, the proportion of successes is sure to be p. 

This, of course, is exactly the-statement to which we 
objected in § 34; yet the statement is as certainly “true” 
in one sense of the word, as it is mo¢ “‘ true” in another. That 
it fails to stand the test of mathematical rigor, I believe the 
argument of § 36 shows. It is therefore not a fit foundation 
for a mathematical theory. But our every-day life is not 
conducted on such rigorous requirements as to “truth.” You 
say, “ Are you sure that he is coming tomorrow?” and receive 
the answer, “ Yes.” Both you and your informant under- 
stand what you mean: the event is contingent upon his not 
dying, for example, and perhaps on many other unforeseen 
circumstances. It is, in fact, not sure at all; it is merely very 
probable: 'so probable that the residual doubt is ‘not worth 
expression. Our statement is in the same class. In fact, the 
residual doubts are even vastly smaller, and may quite properly 
remain unexpressed. 

By painstaking experiment with the bad penny of § 33 we 
can learn the extent of the bias in favor of tails, in the sense 
that the chance of serious error is negligible. Or we can 
accumulate vital statistics and learn the chance of a man, about 
whose state of health we have no special information, dying 
at forty, with quite enough assurance for the purposes of a life 


§ 40. RESUME IOI 


insurance company. But we should not be unaware of the 
logical status of what we.are doing. 

I agree with the statistician who says, “ Life insurance is 
no gamble. Its laws are as immutable as those which cause 
the sun to rise.” In a sense, what he says is true. But I 
cannot agree with him when he tells me that the entire logical 
foundation of the Theory of Probability is to be found in the 
taking of statistical data. 


§ 40. Résumé 


It has seemed impossible to give a discussion of Bernoulli’s 
Theorem without traveling rather far afield at times, and as 
these excursions have removed the emphasis somewhat from 
the facts upon which it should rest, it is probably desirable to 
put those facts together in a compact form. As they divide 
themselves into two sets, one concerned with the number 
of times an event occurs, the other with the proportion which 
that number bears to the number of trials, they will be listed in 


parallel columns. 


Facts about the Number of Successes 
(1) In m independent trials under 
the same essential conditions, the 
number of times an event occurs, 7, 
may take any value from o to m. 


” 


(2) There is a “ most probable 
number of successes. (There may 
be two.). 


(3) This most probable number 
is pm, when pm is an integer; other- 
wise it is one (or both) of the adjacent 
integers. 


(4) The chance of the number of 
successes differing from the most 
probable number by less than a fixed 
amount, 2o matter how large, ap- 
proaches zero as the number of trials 
is indefinitely increased. 

Loosely: In an infinity of trials 
the difference between the actual 
number of successes and the most 
probable number will be infinite. 


Facts about the Proportion of Successes 

(1) In m independent trials under 
the same essential conditions, the 
proportion of times an event occurs, 
n/m, may take any value from oto 1. 


(2) There is a “ 
proportion of successes. 
be two.) 


” 


most probable 
(There may 


(3) This most probable propor- 
tion is either p, or the integral mul- 
tiple of 1/m next smaller than 7, or 
the one next larger than 7, or both. 


(4) The chance of the proportion 
of successes differing from the most 
probable proportion by less than a 
fixed amount, vo matter how small, 
approaches unity as the number of 
trials is indefinitely increased. 

Loosely: In an infinity of trials the 
difference between the actual pro- 
portion of successes and the most 
probable proportion will be zero. 


102 PROBABILITY AND ITS ENGINEERING USES 


§ 41. Mathematical Justification 


So far in this discussion we have avoided formal mathe- 
matics entirely, basing our argument on common-sense infer- 
ences rather than proofs. We must now correct that defect. 
To start with we shall prove that the most probable number 
of successes is either mp or an adjacent integer, by the process 
of showing that each ordinate of Fig. 7 is bigger than the pre- 
ceding one up to mp, and less than the-preceding one from 
there on. 

If we take two hater values of ”; say, 7 — i and 7, 
and divide the ered a! of the second by that of the first, 


we obtain 


ge 
mee 


n 

: PA ena ne pe m 

Pi nieevi) e n I — (mie n Pp 
m 


Now, reducing the numerator of the last member (by cancelling 
1/m) makes this member too small. Hence 


V1 
PZ m | 
a a —". f (34) 


™m 


Similarly, reducing the denominator by inserting —1/m makes 
the last member too large. Hence 


Pala) 


ws He 
iC I-p n— 


(35) 


ae 


™m 


To establish our theorem we need only notice that, if 
n Spm, both factors on the right of (34) are 1 or greater, 
and therefore P,(7) > Pn(a — 1). But if 2 — 12 pm, (35) 
giveseP Gahan aie 

These results, interpreted graphically, say that each 


§ 42. STIRLING’S FORMULA 103 


ordinate of Fig. 7 is bigger than the preceding one up fo the 
stroke at mp (and including the ordinate at mp if there is one); 
and similarly that beyond mp every ordinate is smaller than the 
preceding one. Hence the first ordinate past the stroke is 
bigger than all that follow, and the last one before the stroke 
is bigger than all that precede. Either one of these is larger 
than the other, in which case it is a maximum; or else they are 
equal, in which case they constitute a pair of maxima. 

Thus we have justified points (2) and (3) of our résumé. 
That the argument applies to Fig. 9 as well as to Fig. 7 is 
obvious, since they are identical except for scale. Hence the 
proof includes both columns of the résumé. 


§ 42. Stirling’s Formula 


To justify the statements (4) we need some means of estimating 
the value of the expression C7 p"(1 — p)”™~" when m is very large 
and 7 is nearly equal to mp, which usually means that it is very 
large too. Under these circumstances, however, C¥ contains three 
factorials of very large numbers, and we have already seen in § 10 
that such quantities become almost inconceivably large. For this 
investigation we need a new tool: an approximate formula for 7! 
when 7 is large. 

In this case, in speaking of an “ approximate formula” we do 
not imply that the difference between the true value and the approx- 
imation is small; it is the percentage error which is of consequence. 
For example, a formula which gave a result 3.0415 X 10% for 
so! would be a good approximation, for it differs from 50! only in 
the fifth place. But the actual magnitude of the difference is never- 
theless a very large number. It is of the order of magnitude of 10%”. 

In § 11 we adopted the definition 


m! ={" Pine ake, (6) 
0 


which, interpreted graphically, means that factorial m is the area 
included between the curve 

y = “™ en” 
and the x-axis. We get our first indication as to how to proceed 
in attempting to find,an approximation formula by considering a 
few of the curves that arise from various values of m. It appears 


104 PROBABILITY AND ITS ENGINEERING USES 


from Fig. roa that when m is large the principal contribution to the 
value of the factorial is made by that part of the curve in the neigh- 
borhood of the maximum ordinate. The “ tails” in the neighbor- 
hood of x = o and « = © dwindle away very rapidly and probably 
contribute little to the actual integral. This suggests that the area 


(2) 


0 5 10 


Fic. 10,—ILLusTRATING THE CHANGES OF VARIABLE BY MEANS OF WHICH STIRLING’S 
ForMULA 1s OBTAINED. 


itself will be roughly proportional to the maximum ordinate, at least 
in the sense that the ratio of the two will not vary with m in such 
an extreme fashion as does the factorial itself. 

It is a simple matter to determine this maximum ordinate Y. 
It turns out to occur at the point * = m, and to have the value 


Yen. 


§ 42. STIRLING’S FORMULA 105 
Dividing (6) by Y, and denoting the ratio by /(m), we have 
! a) 
f(m) = a =f, Kansan) 2 di. 


This integral can also be interpreted as the area under a curve: 
the particular curve, of course, being identical with the corresponding 
curve in Fig. toa, except that the vertical scale has been reduced 
in such a way as to make the maximum ordinate unity. Such a set 
of curves is shown in Fig. 10d. 

We now shift these new curves so that the maximum ordinate of 
each lies on the y-axis (which means that we replace x by a new 
variable x’ = * — m), and then contract them in the direction of 
the «-axis in the ratio 1 to m (by replacing x’ by a new variable 
x’" = x’/m). The result of the two substitutions is 


> eee i ” Loe? 1) em ®"]" dee!’s 
ss -1 
or, since the primes are of no further service, 
fonV mf [Geb 1) e-*]" dx. (36) 
Sil 


The curves corresponding to this integral are shown in Fig. toc. 

The next step in our process is an ingenious one, only to be 
justified by the fact that it works. We replace ~ by a new variable 
u, defined by the equation 

(w + 1) = em; (37) 
or what amounts to the same thing, 
x — log (w+ 1) = u?. 

Then we have : 


dx = 2udu+ 2” dus 
and (36) breaks up into the sum of the two terms 


f(m) = m{ em 2u du + am { em ; du. 


For the moment we disregard the question of the limits of integration. 

The first term in this expression integrates immediately into 
— e-™, that is, into — [(v +1) e~*]". As this quantity vanishes 
for both x =— 1 and x = ©, the first term is zero. This leaves 
us finally with the integral 


f(m) = am { em du. (38) 


106 PROBABILITY AND ITS ENGINEERING USES 


There is now nothing to do but to evaluate this integral directly. 
But to do this it is necessary to obtain an expression for x in terms 
of u, and as (37) is transcendental the solution can only be obtained 
as a series. It is found that! 

u 


. = V2 G+ yu? + 232 ut — roeroe ue eae ‘e) 
— (4ut ris — ats +...). (G9) 


We must now determine the limits of integration. From (37) 
it follows that u2 =o, both when x =—T-and when x =o. 
Hence u is --oo at both limits of integration. If the limits were 
equal the integral would vanish,” and we are sure from geometrical 
considerations that this cannot occur. It therefore follows that we 
must assign — oo as one limit and +-oo as the other. If we were 
to assign them wrongly we would. merely change the sign of the in- 
tegral; and as we know that the value of the factorial must be posi- 
tive we would detect that error at once. By trial it is found that 
+ co should be chosen as the upper limit and — oo as a lower one. 
Then, substituting (39) in (38) and integrating term by term, we 
arrive at the final result 


Oat one 22 ) 


12m | 288m2 — 51840m? sph 


and therefore 


— I I 139 
!{— V/ he _ 
m! arm mm™e—™(1 + ae + Sie? = sisiook +... 5). Go 


This is known as Stirling’s formula.? As an illustration of its 
use we may compute 100! which turns out to be 


100! = V200r 100100 ¢~100 (_ 4 iz00 + zsso000 — --.-) 
= 9. 332621515". 
This is identical with the result given in Appendix I. 


1 The region of convergence of this series is |u| < \/ on. | Wedonet here attempt 
to justify its use with infinite limits. 


* If u is defined by the principal branch of the logarithm, x? is positive for every 
value of « on the path of x-integration. Therefore, the u does not leave the real axis. 
As there are no singularities of u/x on the real axis, other than at u =-L c0 » a path 


from infinity to a finite point and back could only return via the same branch of the 
function. 


* The name is usually applied to the first term only: 


m\ = V arm m™ e—™, 


§ 43. ANOTHER APPROXIMATE FORMULA 107 


_ The following table contains several additional factorials, together 
with their Stirling Approximations to one and four terms: 


TABLE XI 


Comparison OF THE FacTorIAL WITH ITs STIRLING APPROXIMATIONS 


True Value One Term Four Terms 
™m Z 
m!| m! % Error m! % Error 

I I ©.922137 8 0.999711 0.03 
2 2 I.gIgoos 4 1.999986 ©.0007 
5 120 118.0192 2 120.0000 

fe) 3,628,800 3,598,696 0.8 3,628,800 

100 | 9.3326215 157 9- 324847 17 0.08 Ongg2o2ns. =o! 


It will be noticed that for values of m greater than 10 even the 
simpler formula gives quite acceptable accuracy. For smaller values 
it would not be used because of the ease with which the factorials 
themselves may be found. 


§ 43. Another Approximate Formula 


We still need an approximation to (1 + «/m)™ when m is very 
large. Obviously, when m is very large 1 + x/m is only slightly 
different from 1. Hence any moderate power would also be but 
slightly different from 1; but the mth power is not “ moderate,” 
so the difference may be considerable. 

First let us find the limit approached as m becomes infinite. 
This we do by writing 


log (: +5) - m log ( = =). 


Obviously, since ’ 
x Pa Ke Coad 
log (1 +5) P41 tac pere Ue ere 


tea we 
log (1 +2) Spee Sarg — a0 


” \” 2 3/3m2 
(:+5) se gf A /2m) + im) — 0. 


Hence 


108 PROBABILITY AND ITS ENGINEERING USES 


which obviously approaches the limit ¢” as m approaches infinity. 
Unity would therefore have been a very poor approximation. The 


fact that 
im (+2) <4 (41) 
m 


Mm = C 


is the first important result of this section. 

If we like, we may think of e” as a first approximation to the value 
of (t + x/m)™, when m is large. We can, however, obtain a better 
approximation by segregating the factor e~ Cate) (8 LIE) MELA 
and expanding it in a series of decreasing powers of m. As a result 


we find 
xt = ox8 I 
ae 4 a Bl 2 Ware 


sae incon Oey ee 
+ (S40 t 4) | we 


This is the second important result of the section. 


§ 44. Justification of the First Half of Bernoulli’s Theorem 


We are now ready to justify Bernoulli’s Theorem by prov- 
ing two things: 


(a) That the maximum ordinate approaches zero as a limit. 
As there are only a finite number of ordinates between mp — e 
and mp + \e, and. as none exceeds the maximum, the'sum of all 
of them must therefore approach zero, which proves half the 
theorem. 


(4) That the ordinates adjacent to a fixed value of n/m decrease 
so rapidly that the sum of all ordinates outside the fixed band in 
Fig. 9 vanishes. This will prove the other half. The proof of 
(2) will be given in this section; the proof of (4) in the next. 

We begin with the formula 


Pr(ie= Chop tara (23) 


$44. JUSTIFICATION OF BERNOULLI’S THEOREM 109g 


and replace all factorials by their Stirling approximations. 
The result is 


Pale) = Var — (MER BY (m= meV oy 


Now the most probable 7 does not differ from mp by more 
than a-unit.” Hence 


TD Ne IRD TT, 
mi—p)—1l <m—n <m(i — p) +1. 


As replacing the denominators by smaller numbers, and the 
numerators by /arger ones, increases the right-hand side of (43) 
it is obvious that 


m 


Samp — 1) ltt =p) = 1] 


( m(I — p) pee —p)+1 p y 
PUY —p) 1 ee 1 Me ptr 
From this point on we consider the three terms separately. 

First it is obvious that the first factor approaches zero 
with increasing m; for m occurs twice in the denominator and 
only once in the numerator. Hence if the other factors do 
not become infinite our proposition is proved. 

The second factor, however, is of the form 


Pan) < 


if 


which reduces to (42) if — 1/(1 — p) iscalled x. Hence this 
factor approaches e-/G-P and does not become infinite. 
The third factor can be rearranged so as to take the form 


n 


ee 


110 PROBABILITY AND ITS ENGINEERING USES 


As the quantity within the parentheses is obviously greater 
than unity, the factor will be increased if ” is replaced by 
something larger than itself, as, for example, by mp + 1. 
Thus the third factor is less than 


Se alee 


As m approaches infinity the last part of this product reduces 
to unity; while by-identifying mp with the m of (42), and 
p/(1 — p)withx, the numerator of the first part becomes e”°~; 
and by a similar process the denominator becomes e-!. As 
none of these is infinite (43) must vanish. 2 proves our 
theorem. 


§ 45. Justification of the Second Half of Bernoulli’s Theorem 
The second half of Bernoulli’s Theorem is easier to justify. 
We know that the ordinate at a particular 7 is bigger than 
at any other z further removed from the most probable. Let 
us, then, consider an ” equal to my, where 4 is not exactly p. 


We have, from (43), 


ro) Nirae=val et) OT: 
Pa) = Jo 


2nn(1 — n)m 


or 


if the bracketed expression is denoted by 2. 

Our first step is to show that z is always less than unity. 
This is best done by finding the maximum value of z: or rather 
of log z, for that is easier. We have 


logz = (1 — n)llog (1 — p) — log (1 — »)] + » og p — log n). 
By actual differentiation it is found that 


SL (=42) 


dn I—?p 9/ 44) 


§45. JUSTIFICATION OF BERNOULLI’S THEOREM i111 


If this is to be zero—as it must be for a maximum value— 
m must equal p; and when 7 = p, logz =o. Hence the 
maximum value of zis unity. For any other value of 7, z <1. 
Finally suppose 7 is set equal to that one of the quantities 
P — & p + « which happens to give the larger ordinate in 
Fig. 9. Then outside the range bounded by these quantities 
all the ordinates are smaller than the one at 7: that is, they 
are allless than V1/20n(1 — n)mz™. As there are less than m 
of them, their sum cannot exceed V1/207(1 — ») Vm2". But 
since z <1, the quantity mz" can be made as small as 
we like by taking m sufficiently large. In other words, the 
sum of all ordinates outside the range p+ «> n/m>p-— « 
can be made as small as we wish by taking m large enough. 
This proves the second half of our theorem. Later on, 
in § 82, we shall find a means of telling how great the prob- 
ability is of either 7 or z/m lying in any such range, without 
the labor of computing the individual ordinates. We shall 
find, in fact, that if we were to draw another set of distribution 


curves for the variable 7//m, the separate curves of this set 
would be almost indistinguishable for large values of m. This 
is, in a way, suggested by the fact that that factor of (43) 
which vanishes as m increases indefinitely, does so because of 
the occurrence of Vm in the denominator. This suggests that 
the maximum ordinates of the curves of Fig. 8 would all be 
about equal if they were multiplied by the square roots of their 
respective values of m. But if the areas of the individual 
rectangles are not to be altered, such a change can only be 
brought about by condensing the curves laterally in the same 
ratio, which is equivalent to plotting the distribution functions 
for n/s/m. 

_ Though the present is not a suitable place to prove it, this 
is actually the case: As the number of trials is increased the 
peaks of Fig. 8 become lower and lower in proportion to the 
reciprocals of the square roots of the number of trials, while the 
curves spread out laterally to greater and greater extents, in 
proportion to the square roots of m. 


112 PROBABILITY AND ITS ENGINEERING USES 


§ 46. Regarding the Experimental Determination of Probability 


Bernoulli’s Theorem says, in substance, that the chance 
of an important difference existing between p, the probability 
of success in a single trial, and 7/m, the proportion of successes 
in m independent trials, may be made as small as we please 
by making the number of trials large enough. If this 1s true, 
it is quite obvious that 7/m may be accepted for most practical 
purposes as an approximation to the probability p; and this 
affords us a new way of measuring probability. 

In the case of a perfectly good penny, for instance, we could 
either form the set of equally likely, mutually exclusive events 
‘“‘ heads, tails,” and conclude at once that the chance of 
obtaining a head is 4; or we could toss the penny repeatedly 
and accept the proportion of successes as the value of the 
probability. In this specific case, of course, the former method 
would be by far the better, for it gives us an exact answer 
whereas the latter method gives only an approximation at best. 
But there are many questions to which this exact method 
cannot be applied, because no suitable set of alternative events 
is known. ‘Take, for example, the chance of a man of twenty 
dying between the ages of fifty and fifty-five: it is quite out of 
the question to set up a complete set of equally likely events 
in this case. But it is possible to pick out a large number of 
men of age twenty, and by waiting twenty-five years determine 
what proportion actually die between the specified ages. If 
we are willing to admit that the chance of one of them dying 
is not influenced by what occurs to any of the others, we may 
accept this proportion as a satisfactory experimental deter- 
mination of the desired probability. 


Bernoulli’s Theorem, therefore, furnishes us an acceptable 
makeshift when the direct a priori Ge PTAn GIG of probability 
is not feasible. 


But we must observe caution in accepting this argument. 
Bernoulli’s Theorem has been proved by the use of (23), and 
(23) was derived by the use of (20), which itself has only been 
established for cases to which the fundamental definition can 


§ 47. THE MULTIPLICATION THEOREM 113 


be applied directly. Hence either Bernoulli’s Theorem itself, 
or the Multiplication Theorem (20), must be put upon a more 
general basis than has so far been done, if we are to use it as a 
justification of this experimental method of measuring prob- 
ability. As the generalization of the Multiplication Theorem 
is not difficult so long as the events 4 and B are independent 
[as they are supposed to be in deriving (23)], we shall give 
this generalized proof, thereby justifying Bernoulli’s Theorem. 
Afterward we shall find it possible to make use of Bernoulli’s 
Theorem itself to extend the Multiplication Theorem to cases 
where the events are not independent and their probabilities 
are not directly obtainable. 


§ 47. The Multiplication Theorem 


We consider any two independent events 4 and B, the 
probabilities of which are p and p’, respectively. The prob- 
ability that both occur is some function of p and p’. We 
denote it by 

P(A, B) = fp; p’). (45) 


It is our problem to determine the form of the function /. 

To begin with, we note that if the event 4 happens to 
include two mutually exclusive parts 4 and 42 (as “ getting 
an even number with a die” includes the mutually exclusive 
events “‘ getting a two,” “getting a four” and “ getting a 
six’), the compound event 4B will also include two mutually 
exclusive parts 4,B and 42B. By Convention II, therefore, 


P(A, B) = P(A, B) + P(A2, 8), 


and 

p= pi t pr; 
pi and p2 being, of course, the probabilities of 4, and Ao, 
respectively. 


With these facts before us, we need only observe that (45) 
is supposed to apply to amy two independent events in order to 
arrive at the functional equation 


SQ. + PsP) = SOP) + SPP D> E: 


114. PROBABILITY AND ITS ENGINEERING USES 


which can only be satisfied provided /(p, p’) is of the form 
pF(p)s 

Next we notice that there is no logical distinction between 
the occurrence of “both 4 and B” and the occurrence of 
“both B and 4”’: the two expressions are entirely synonymous 
so long as the events are independent. Hence 


I (2; DP) i ID, D)s 


p F(p’) = p' Ftp). 
This requires, however, that 


Fp) . F(p’ 
2 p’ 
Since the left-hand side of this equation does not contain 7’, 
it cannot vary as the value of p’ changes. But if the left-hand 
side does not vary with 7’, the right-hand side cannot. That 
is, F' (p’)/p’ must be a constant. Call it C. Then F(p’) = 
Cp’, and 


or 


TPP) = PLO eG Pps 
Finally, we notice that if the event 4 is certain to happen 
that is, if pi= (1 — the occurrence-of both!4 andes ois 


1 We can readily show this if we assume the possibility of expanding f(p, p’) in a 
Taylor’s Series. We write 


Sp, p') = a+ bp +cp'+dp?+..., (46) 
whence 
S(P1 P') +S (Pa p!) = 24 + bpi + bpa + rcp’ + dpi? + dp? +..., (47) 
and 
S(pi + pa, 2’) = a + hpi + bpn + cp’ + dp? + 2dpips + dp2+.... (48) 


If we equate coefficients of like powers in these equations we get a = o from the 
constant terms, ¢ = o from the terms in p’, d = o from the terms in pips, and so on. 
A little consideration shows, in fact, that every term of (46) which contains a power 
of p higher than the first will give rise in (48) to cross-products between p: and p2 
whereas in (47) no such cross-products can exist. It follows, therefore, that (46) 
must reduce to the form 


pbt+ep’+gp"%+...) =p F(p’). 


§ 47. THE MULTIPLICATION THEOREM IIs 


synonymous with the occurrence of ““B’’, wherefore we conclude 
that f(1, p’)\= Cp’ = p% This establishes the fact that 
C = 1, and completes our proof so far as independent events 
are concerned. 

Having thus justified the Multiplication Theorem in the 
case of independent events, whether or not we are able to 
set up the group of events required by our definition of the 
measure of probability, we are sure that Bernoulli’s Theorem 
is true in general. Suppose, then that we have a pair of events 
A and B which are not independent. Suppose, moreover, 
that we make a large number M of independent trials of the 
compound event 4B. Let there be, among these M trials, 
Na in which 4 occurs and Naz in which both 4 and B occur. 
Then the ratio N/M is not likely to differ much from P(4). 
Similarly Naz/Na, which is the “ proportion of times both 
A and B occur if 4 does,” is not likely to differ much from 
P.(B). And finally Naz/M is not likely to differ much from 
P(AB). In fact the chance of the difference exceeding any 
preassigned amount can be made as small as we please in all 
three cases by taking M large enough. 

Let us denote the differences that actually occur by 4, 6 
and 8’’, respectively, so that 


Pi =e Se fg 
P4(B) = “1? — 2, 
A 
P(AB) = ae — 3”. 


Then we find by direct computation that 


Nas ’ Na ’ ” 
MOAN ANGE) Sz Joke) Pclara \ paemna as Y gs amie (49) 
Now suppose P(#) Pa(B) — P(AB) = d, where d # 0. 
Then one or the other of the 5’s must exceed d/4, for otherwise 
the right-hand side could not equal d. There is therefore 


116 PROBABILITY AND ITS ENGINEERING USES 


unit probability that either the one or the other will exceed 
d/4, no matter how large M may be. But this is absurd; 
for Bernoulli’s Theorem tells us that this probability (for each 
one individually, and therefore also for “one or the other ”’) 
is zero. Hence we must conclude that 


P(A) Pa(B) = P(4B). 


PROBLEMS 


1. The fact that (44) is zero at 7 = p has been said to make this 
the value of » for which z is a maximum. It might just as well be 
a minimum, however. How do we know which it is? 


2. The idea of a “ distribution curve ” is obviously a general one. 
Construct a distribution curve showing the probability of runs of 
heads of various lengths in tossing a penny. 


3. Construct a distribution function for the sums of the num- 
bers appearing when two dice are tossed. 


4. Construct a distribution function for the numbers appearing 
when only one die is tossed. 


REFERENCES FOR OurTsIDE READING 


. Cootipce: Probability, pp. 4-10; and Chapter III. 

. Venn: Logic of Chance, pp. 91-95. 

. Bacnetier: Calcul des Probabilités, Chapter II. 

. CzuBer: Wahrscheinlichkeitsrechnung, Vol. 1, pp. 128-139. 


IN Gey TS) 


CHAPTER V 
PROBABILITY AND EXPERIMENT; Bayes’ THEOREM 


§ 48. The “ Life on Mars” Paradox 


All logical processes are hedged about by a maze of fine 
distinctions which cannot be included in a formal symbolic 
expression without making it so complicated as utterly to 
destroy its usefulness. These distinctions are often important: 
so important in fact that all sorts of errors may arise through 
failing to remember them. In particular, in the Theory of 
Probability there have arisen a host of paradoxes, almost all 
of which are due to using formal logical processes in places 
where, upon recalling their origin, we should not expect them 
to apply. 

One of these is a very famous one, used by the adherents of 
“cogent reason”’ to confound the “ insufficient reasonists.” 
They raise the question, What is the probability of life on 
Mars? Obviously, they say, we are quite ignorant. There- 
fore, on the basis of insufficient reason we must admit the 
answer to be 3. 

But the problem can also be attacked this way: What is 
the probability of no horses on Mars?, to which the answer is 
4; What is the probability of no cows on Mars?, to which 
the answer is again 4; and this process can obviously be 
extended to every class of animal or vegetable. Say there are 
nofthem. Then the probability that all these things are true: 
that is, that there are no horses, and no cows, and no other 
form of life, is 1/2”, which is certainly very small. The com- 
plementary probability, that there is at least ove kind of life, 
is therefore near certainty. Insufficient reason has therefore 
given us two answers, one at least of which must be wrong. | 


117 


118 PROBABILITY AND ITS ENGINEERING USES 


Such is the argument as presented; but in spite of its 
superficial validity it is not so strong as it seems. To make the 
first half—the one leading to 4 as an answer—correct, we must 
admit ourselves to know nothing whatever which is pertinent 
to answering the question. To make the second half—which 
leads to a little less than one—correct, we must admit, (a) that 
we know “life”? to be a complex of forms, and (4) that the 
occurrence of one of these forms is quite independent of the 
occurrence of the rest. Of course, both postulates as to our 
state of ignorance” are preposterous, which makes it just a 
bit harder to think straight about the problem; but we cannot 
object to that: the proposer of the question was entitled to 
postulate what he would, and we must for the time being 
divest ourselves of any additional knowledge we possess. Let 
us then put ourselves into a very unusual universe, inhabited 
by many kinds of life, the existence-or non-existence of no 
one of which is prejudiced by the rest. What then is the force 
of the second half of the argument? Merely to demonstrate 
that this information is pertinent. To one who lived in this 
hypothetical universe where all the untruths upon which the 
second argument is based were true, the answer would be 
nearly one; just as surely as it is also the answer to the question, 
What is the chance of a head appearing when a handful of 
pennies is tossed in the air? To a six-weeks old baby, as 
innocent as the individual to whom the first half applies, 4 
would be the correct answer. To the Omniscient the answer 
is either unity or zero. It is neither to us just because we are 
in neither state of knowledge. 

What it zs to us I cannot say, because I am unable to 
evaluate the importance of all the various biological and 
astronomical facts which are known to bear upon the question; 
but I am not therefore justified in disregarding a// I know, 
as is done in the first half, and calling the probability 3 3) hor 
in disregarding part of what I know, as is done in the second 
half, and saying that it is extremely small. 

But if probability is a measure of the importance of our 
state of ignorance it must change its value whenever we add 


“¢ 


§ 49. BAYES’ THEOREM Ig 


new knowledge.t_ And so it does. I pick up a coin in the 
dark, toss it, and ask the*probability that heads is uppermost. 
Without a doubt the answer is 4. The coin may be biased, 
but I have no knowledge regarding any such bias which would 
make tails relatively more (or less) probable than heads. It 
may even be alike on both sides, but again I know nothing 
which makes heads more likely than tails. But suppose I 
now take the coin to the light and see that it really has heads on 
both sides. Immediately the chance of heads being upper- 
most if I toss it again (or that heads were uppermost when 
I tossed it before) changes to 1. 

The knowledge that the penny is alike on both sides is of 
such an absolute character that it enables us to state at once 
the change that it makes in our probability. We might, 
however, have gotten inferential knowledge instead. For 
example, we might have tossed it a large number of times 
—100, say — and found that heads appeared every time. Such 
a thing might happen either with a perfectly good penny or 
with one that has heads on both sides. It might also happen 
with a penny which was badly biased. We cannot say with 
assurance just which situation exists, but certainly the prob- 
ability that heads will appear upon the next toss is no longer $. 
It has been changed by experiment. 

This, then leads us to the question of how much it has 
changed, to which Bayes’ Theorem is the clue. 


§ 49. Bayes’ Theorem 


We return to the consideration of the equations (20) and 
(21). Obviously, (20) can be written either in the form 


PAR) = PUA) Pi{B) (20) 
or in the form 


P(AB) = P(B) Ps(4), (50) 


11It is sometimes objected that this makes probability a “personal” matter. 
So it does, in a sense, but only in the sense in which it is a personal matter—that is, 
in the sense in which it depends upon individual differences of knowledge. No one 
would say the probability of heads appearing when a penny is tossed was 3 to one 
who knew. It is only that for all of us when none of us knows a thing about it. 


120 PROBABILITY AND ITS ENGINEERING USES 


provided that, when any questions of order of occurrence are 
involved, the same order shall apply to both.! 
We recall that our symbols have the following meanings:? 


P(AB). The probability that 4 happened and was followed 
by B. 

P(A). Nothing is known about whether or not B happened. 
P(A) is the probability that 4 did. 

P(B). Nothing is known about whether or not 4 happened. 
P(B) is the probability that B did. 

P.(B). A is known to have happened. This is the prob- 
ability that it was followed by B. 


P;(4). B is known to have happened. This is the prob- 
ability that it was preceded by 4. 
Since P(4B) means exactly the same thing in both equa- 
tions, we may equate their right-hand members and solve for 


P;(A4). The result is 


Oe eRe anes 


mene (53) 


1Tn the footnote in § 22 we called attention to the fact that the chance that “‘ a man 
is shot and dies,” taken in that temporal, or causative, sequence, is not the same as 
the chance that “‘he dies and is shot,” taken in that order. Suppose we call being 
shot the “‘event 4,” and dying “‘event B.” Then the jrrst order is 4B, the other 
BA, and their respective probabilities may be denoted by P(4B) and P(BA). “To 
the first of these corresponds the pair of equations (20) and (50); and to the second an 
exactly similar set which we may write in the form 


P(BA) = P(B) P34). (51) 
, P(BA) = P(A) Pa(B). . (52) 


Obviously the right-hand members of these equations are symbolically equivalent to 
(20) and (50). But the equivalence stops with the symbolism. For P(B) in (50) 
means “‘ the probability that a man dies,” and the Pg(4) means “ the probability 
that a dying man has been shot”; while in (51), though the P(B) means just what it 
did before, the Pg(4) means “ the probability that, having died, he wil] de shot.” 

We shall interest ourselves only in (20) and (0), though the other pair also lead 
to a sort of “‘ Bayes’ Theorem” which, under certain circumstances, has a slightly 
different meaning from the usual one. 


? The use of the past tense is of no significance: present or future would do just 
exactly as well, except that grammatically the perfect and past are slightly less clumsy 
than future perfect and future, while the “ tenseless present” of logic destroys the 
sharpness of the distinction between past and present. 


§ 50. THE “BOX PARADOX” 121 


In other words, knowing the probability P(4) which would 
apply if we knew nothing of whether B occurred or not, and 
the probability P(B) which would apply if we were in total 
ignorance regarding J, and also the probability of B following 
4: knowing these, and also knowing that B did occur, we can 
find the probability that it was preceded by 4. Equation (53) 
is therefore the clue to interpreting the influence of new knowledge 
upon probability. It is called Bayes’ Theorem. 

Before illustrating. its use, we may note that it is usually 
given in the somewhat different form to which it reduces when 
(21) is substituted for the denominator of (53). This form is 


Pi) Pate) 


P3(4) a > P(A) P4(BY (54) 


or, in words: 


Bayes’ THEOREM: If the event B never occurs unless 
preceded by one or the other of a set of events A, the uncon- 
ditional probabilities of which are P(A), and if B is known 
to have happened, the chance that it was preceded by a par- 
ticular one of the events A 1s a fraction, the numerator of 
which is the product of the probability of this particular 
event by the chance of it being followed by B, while the denom- 
inator is a sum of exactly similar terms, one for each of the 
events A.} 


§ 50. Some Instructive Illustrations; Bertrand’s “Box Paradox”? 


The following example is an illustration in which there can 
be no question as to our possession of the information necessary 
to the use of Bayes’ Theorem: 


Example 34.—Three boxes have in them two coins each. In one 
box both are gold, in one both are silver, in the other they are mixed. 


11t must be noticed that throughout the entire formula every symbol refers to 
the sequence “4 followed by B.” From (st), (52) and (21) an equation can be 
obtained which is identical in form, but in which the sequence is always “ B followed 
by 4.” Hence (53) and (54) are true when consistently read either way; but when 
the order is of any consequence, as it sometimes is, care must be taken not to mix 
the two. 


122 PROBABILITY AND ITS ENGINEERING USES 


Outside they are of identical appearance. A man chooses a box and 
takes out a coin, which proves to be gold. What 1s the chance that the 
other coin in the box is also gold ? 

It is not necessary to use Bayes’ Theorem to solve this 
problem. The solution can be obtained directly as follows: 
Each of the three gold coins is as likely to be the one chosen 
as the other. Two are in the box in which “ the other is also 
gold.”” Hence the answer is 2. It will, however, serve our 
purpose best to get the answer in two rather roundabout ways 
corresponding to ($3) and (54), respectively, and for this 
purpose we choose the following symbolism: .4,, for the event 
“chooses the box with two gold coins,” and 4,,, 4., for the 
alternative possibilities; B, for “* picks up a gold coin,” and B, 
for the opposite. 

If we know nothing about what has transpired,! the prob- 
ability of 4,, is P(4,) = 4. Likewise, if we know nothing 
about what has transpired, one coin is as likely to have been 
picked up as another. Hence P(B,) = 4. But if the box with 
two gold coins was chosen, the chance of picking out a gold 
coin was P., (Bo) = = 1. Hence (53) gives for the probability, 
after a gold coin has been seen, of the box having two gold 
coins, 

cies EX Aaa) PaBs) ae 4-1 ie, 2 
Pscdia) P(B,) a3 3 . 

Or we can phrase the solution this way: P(4,,) and 
P(A,;) are also 4, while the conditional probabilities of getting 
a gold coin if the chosen box is ss or gs, respectively, are 
P.,(B,) = 0 and P.,(B,) = 4. Hence (54) becomes: 

P(A.) aay) 
P(A) Pa,(Bo) + P( Ass) Pa,(By) + P( Ags) Pa,,(By) 
rf 4-1 ee: 
ere vi Ome 

As was to be expected, the answer is the same in all three 

cases. 


lege heal aa 


Or, “ will transpire,” 


§ 51. ANOTHER URN PROBLEM 123 


§ 51. Some Instructive Illustrations; Another Urn Problem 


The following is another problem to which Bayes’ Theorem 
affords the solution: 


ExampLe 35.—A box has had ten balls put in it by the following 
procedure: An auxiliary container holds equal numbers of black and 
white balls, thoroughly mixed. A blindfolded man picks one out and 
places it in the box. He is watched by an assistant who immediately 
puts another ball of the same kind in the auxiliary container and Stirs 
up the contents. Then a second ball is drawn, and so on. After the 
box has received its ten balls, an experiment 1s performed by drawing 
balls repeatedly from it, noting the color, and replacing them. Five 
such attempts show four black balls and one white. What is the most 
probable contents of the box? 


Before the experiment, the probability of 4 white and 
1o — J black balls was obviously: 
PUA) = G2.G)- 


If A white balls are there, the chance of drawing one white 
and four black in five attempts (this is the “‘ event B ”’ which is 
known to have happened), is 


P.(B) = C? (Z\(72 a ae 


Io Io 


Hence Bayes’ Theorem gives 


aye cvcr(4)(2—4)' 


ite) 


Io 
cy hee A\[10 — A\* 
ily edd | pee 
B) Ca e1(Z)( Io 


Obviously the (4)!°, Cj, and rhe 10’s by which the /’s are 
divided are common factors of all terms. Hence they can be 
cancelled, thus yielding 


C® A (10 — A) 
P3(A) = fie Rc Wl hae 


YC? A(10 — A)4 
A=0 


(55) 


Actual computation gives the results shown in Table XII. 


124 PROBABILITY AND ITS ENGINEERING USES 


The required probabilities are in the last column. Obviously 
4 white and 6 black is the most likely composition. 

Before the experiment was performed the most probable 
composition was half and half. The preponderance of black 
results has altered the probabilities in the direction which 
common sense would dictate; but the experimental proportion 
(4 white) is still only one-third as likely as the most probable 
proportion (2); and only half as likely as the half-and-half 


division. 


TABLE XII 

A Cz A (10 — A)4 | Cy A (10 — 4)4 P(A) 
° I fe) ° 0.00000 
I 10 6561 65,610 0.01837 
2 45 8192 368,640 0.10323 
3 120 7203 864,360 0.24204 
4 210 5184 ; 1,088,640 ©. 30484 
5 252 3125 787,500 0.22051 
6 210 1536 322,560 0.09032 
7 120 567 68,040 0.01905 
8 45 128 55760 0.0016! 
9 10 9 go ©.00003 

fo) I ° ° © .00000 


2 = 3,571,200 


Suppose, instead of five trials, fifty were made, giving 
10 white and 40 black. Common sense tells us the experi- 
mental evidence must now be given more weight than before, 
but common sense does not tell us how much more. Bayes’ 
Theorem does. For now we must change the powers of 4 
and (10 — A) in (55), to 10 and 4o instead of 1 and 4. This, 
however, replaces all the numbers in the third column of 
Table XII by their tenth powers, which quite obviously causes 
the third entry (8192) to increase very much more than any 
of the rest. In fact, the next largest entry (7203) is only about 
0.9 times as large as 8192; after the tenth powers are taken 
it will be about 4 as large. Hence when these numbers are 
multiplied by 45 and 120, respectively, the latter of which is 
about three times the former, they will yield new probabilities 


S162. THE BAD PENNY 125 


which do not differ much in magnitude. Of course the rest 
of the new probabilities will be very much smaller. 

The exact results are shown in the second column of 
Table XIII; while the third column contains the probabilities 
after 500 tests have given 400 black and 100 white balls. In 
the last case the experimental evidence completely outweighs 
any preconceptions arising from our knowledge of how the 
box was filled. It is 99.999 per cent sure that there are just 
two white balls. 


TABLE XIII 
Pp(A) 
A 
After 50 Tests | After 500 Tests 

Oo ©.00000 ©.00000 
I 0.01334 ©.00000 
2 0.55276 2.99999 
3 0.40714 ©.00001 
4 0.02657 © .00000 
5 ©.00020 

6 ©.00000 


§ 52. Some Instructive Illustrations; The Bad Penny 


Let us consider one more example cf the use of Bayes’ 
Theorem— again in connection with a problem of no practical 
consequence, though it may aid us in gaining an accurate pic- 
ture of what the theorem is good for. 

We suppose, to begin with, that a “ penny ” is given us, 
and that we are asked to dete:mine whether it is alike on both 
sides, or normal. Of course the sensible thing to do would 
be to look and see; but this is not a sensible problem. Instead 
we propose to find out by tossing it repeatedly and noting 
what shows up. At the start, before any experiments have 
been made, there is a certain probability that it is bad (alike 
on both sides). We cannot say what this is: we may have been 
told by someone in whom we have considerable confidence 
that it is bad, and in that case the probability is high. Or 


126 PROBABILITY AND ITS ENGINEERING USES 


we may have been told that it is good, and the probability 1s 
low. Whatever it is, let it be denoted by Po(4), and the com- 
plementary probability that the penny is good by Po(g). 
Suppose, now, that 7 throws are made and all of them result 
in heads. If the penny has heads on both sides, the chance 
of this is P,(z) = 1; while if it is a good penny the chance of a 
run of 7 heads is only P,(z) = (4)”. Substituting these values 
in ($4) we obtain at once, as the probability, after the experi- 
ment, that the penny is bad, 
Po(6)-1 
he Po(d)-1 + Po(g)-(4)”’ 


or, if we write Po(g)/Po(é) = &, for simplicity, 


I 
pat 
2 


P,(d) = 


This result depends upon two things, as it should: upon 
the degree of our assurance before we experiment, and upon 
the number of trials carried out. If there is no reason to 
suspect the penny—for instance, if it is a coin casually picked 
up on the street—Po(g) certainly exceeds Po(d) and k is large; 
for there is obviously much greater likelihood of happening 
on a good coin than a bad one. Suppose we assume that there 
is only one chance in a million that it is bad, which makes 
k = 1,000,000. Then after ten heads have appeared without 
interruption the chance of it being bad is! Pio(4) = agp. 
The new probability, though still small, is very much larger 
than before the experiment was performed. If the run con- 
tinues and twenty heads appear, the probability becomes }: 
twenty heads, in other words, just counterbalance our pre- 
conceived notion that the penny was very probably good. 
Thirty heads, on the other hand, give a probability P30(4) = 
0.999. ‘There is now only one chance in a thousand that the 
penny is not bad. 


1 These are all round numbers based upon the approximation 2° = 1000, which 
is plenty near enough for our purpose here. The true value is 1024. 


§ 53. THE USES OF BAYES’ THEOREM 127 


Of course these figures have resulted only because of our 
assumption of 1/1,000,000°as the @ priori chance that the thing 
is bad, and we must not lose sight of the fact that we have no 
way to check this guess.. As exact values they are worthless, 
but they serve to illustrate how rapidly an uninterrupted run 
of luck may wipe out a strong presumption in one direction 
and replace it by an equally strong presumption the other 
way. Had we chosen any other number than a million the 
result would have been much the same: if we had guessed the 
chance of the penny being bad to be the inconceivably small 
number 0.000,000,000,001, it would still require only fifty heads 
in succession to replace this probability by 0.999. 

Finally, suppose we were mathematically certain that the 
penny was good—suppose, in other words, that Po(4) were 
zero, not approximately, but absolutely. Then & would be 
infinite, and so also would k/2” no matter how great 2 might be. 
In this case Pn(4) would be zero for every value of 7. This, 
too, is as it should be, for experimental evidence is trivial 
beside infallible certainty. How we could ever reach such a 
state of absolute assurance I do not know; but if we could, no 
amount of experimentation should be allowed to shake our 


faith. 
§ 53. The Uses to Which Bayes’ Theorem May Be Put 


There are many important problems of a scientific character 
which are essentially similar to the one we have just been 
considering. Almost any instance where a scientific generaliza- 
tion is to be made from a limited amount of data could be cited 
in illustration: for example, the conclusion that all electrons 
have like charges. One bit of evidence consists of the fact 
that a certain number have been isolated and their charges 
measured, and there is a considerable volume of indirect evi- 
dence. But how sure are we of the rule? Not absolutely 
sure, certainly; for though it is pretty generally believed, it 
has occasionally been contested. 

We would be glad, if we could, to get a measure of our 
certainty in cases such as this; but there is nothing in the 


128 PROBABILITY AND ITS ENGINEERING USES 


Theory of Probability to aid us except Bayes’ Theorem, which 
we can seldom use for the purpose because, as in the problem 
of the last section, we cannot measure the unconditional 
probabilities. Any of the answers which we have obtained 
would be correct 1r k had the values assumed; other answers 
could easily be obtained 1r k were something Ble but as long 
as we do not know what & is, we cannot be certain of any of 
them. 

To such problems, then, no exact answer can be expected 
from the use of Bayes’ Theorem; not because of any logical 
uncertainty as to the theorem itself,! but because we do not 
possess the data necessary for its use. On the other hand 
it is often of service in dealing-with problems to which gualita- 
tive answers are acceptable. Thus in the case of the bad 
penny example of § 52, we might desire to know how many 
throws would be required to justify-the belief that the penny 
is bad. To this question the answer “not less than twenty 
nor more than fifty” can safely be given—not very close 
limits, to be sure, but at least indicating the order of magnitude. 
A smaller number of throws would give us very little informa- 
tion, because of the inherent probability that the penny is 
good; and a larger number would not increase our certainty 
to any material extent.? 

There are also cases in which, although we have no exact 
means of determining the a priori probabilities, we have 
reason to believe we know them to a fair degree of approxima- 


1 Due to numerous inexact statements which have been made of it, Bayes’ Theorem 
has been the subject of much adverse criticism, and some authorities have even gone 
so far as to reject it entirely. At present, however, this criticism seems to be dying 
out, the commonly accepted view being much the same as that stated above: that 
it is just as sound logically as any other part of the Theory of Probability, and may be. 
trusted to give reliable results when we can get a grip on it. The trouble is that we so 
seldom can. 


2 This is true, not only because the result given by Bayes’ Theorem is already 
substantially 1, but also because other alternatives, originally neglected because of 
their inherent abwerdien become of greater and greater consequence as the run pro- 
ceeds. Suppose, after a run of 100 heads, a tail were to appear. The chance of this 
happening with a good penny is so extremely small, that hallucination or substitution 
would merit serious consideration, 


§ 64. A PROBLEM IN SAMPLING 129 


tion. An example is given in the section immediately followin g. 
To such problems I think it is wise to apply the theorem, for 
while no exact information can be expected from this course, 
better information will be obtained than can be gotten in any 
other way. It will need judicious interpreting, of course, but 
that is not an uncommon state of affairs when mathematical 
reasoning is applied to scientific problems. 

Finally, Bayes’ Theorem can often be applied with absolute 
rigor in the course of a formal mathematical argument. It 
is unnecessary to introduce artificial illustrations of its use in 
this direction, however, as enough examples will arise naturally 
in the course of our further studies. 


§ 54. Some Instructive Illustrations; An Elementary Problem in 
Sampling 


Consider the following problem in the testing of factory 
output: 

ExaAmpPLe 36.—A factory produces a certain type of screw as a 
standard product. The screws are collected at the machine in boxes 
of 2200 each. Long experience has shown that the proportion of these 


boxes which contain various percentages of bad screws 1s substantially 
as follows: 


Peta Proportion of Boxes 
Observed to Contain 

Bad Screws : 
in the B this Percentage of 
re econ Bad Screws 

fe) 0.78 

I 0.17 

2 0.034 

3 0.009 

4 0.005 

5 0.002 

6 ©.000 


Two per cent badness has been adopted as a manufacturing standard; 
that is, any box which contains 2 per cent or less of bad screws ts regarded 
as satisfactory, the aim of the inspection process being to reject those 
which are poorer. The normal inspection consists in the examination 
of 50 screws out of each box. A particular box, produced at a time when 


130 PROBABILITY AND ITS ENGINEERING USES 


there was no special reason to suspect that the machines were not operat- 
ing properly, showed 6 bad screws under normal inspection. What ts 
the probability that the manufacturing standard had not been maintained 
in the production of this box 2 


We know from Bernoulli’s Theorem that the proportions listed 
in the second column of this table are probably good approximations 
to the probabilities of the various percentages of badness. They 
may therefore be used as the values of P(4) in (54), the “event 7” 
standing successively for the various possible percentages of bad 
screws in the box. The conditional probabilities then follow at once 


TABLE XIV 
(1200 — 124)! 

A P(A ci24 Pe Nose Be eee 
St : (1200 — 124 — 4a)! P2(4) 
° 0.78 ©.0000 ee by SU ak ©),000) 438 ©.000 
I 0.17 9.2400 ? 8.746 0.014 ©.004 
2 ©.034 1.3460 5 5.548 0.254 ©.070 
3 0.009 1.9478 © Ri itex: 0.614 ©.170 
4 ©.005 op Dyegp) Ul 27 201 Bag Git 0.374 
Ag 0.002 5.0060 7 Tayo 1.378 0. 382 
6 ©.000 ©.000 

Guay “> 


from (25), p and qg being, respectively, 6 and 44, while m and 7 are 
124 and 1200— 124. When these are substituted in (54), and 
obvious common factors are cancelled out, they yield the formula 


(1200 — 124)! 
(1200 — 124 — 44)! 
(1200 — 124)! 
(1200 — 124 — 44)! 


PEA).Ce 


P3(A) = : 
»» PUY Cea 


It is a comparatively simple matter to compute the values of this 
expression. The outline of the computation is shown in Table XIV: 
The second column contains the values of P(4) as given in the state- 
ment of the example; the third column contains the binomial coeffi- 
cients as taken from Appendix III; while the fourth column contains 
the values of See 20 rest Th fi 

ESE AGT PE ese were found by the use 
of Appendix II, in which the logarithms of large factorials are given. 


When these three columns are multiplied together they give the fifth. — 


Each of the entries in this column is the numerator of the fraction 


§ 54. PROBLEMS ee 


which represents Pz(4) for the corresponding value of 4; and the 
sum of the entire column, 3.6111°°, is the denominator for all of them. 
Hence the desired probabilities are to be found by dividing every 
entry by 3.611". The results are shown in the last column. 

The chance that more than 2 per cent are bad is the sum of the 
last four entries in this column, and is so large as to render it highly 
probable that trouble exists. ; 

It is unnecessary to say that if the manufacturing situation 
postulated in the example really existed, this sort of computation 
would be made once for all for such results as were likely to be met. 
Thereafter it would be necessary only to refer to the tabulated prob- 
abilities to learn the significance of any set of results; or, more 
probably, that number of defective screws would be determined for 
which there was an even chance of trouble existing, and some routine 
would be established to assure that the trouble was quickly located 
and corrected. The exact manufacturing conditions would determine 
what sort of routine was best. 

However, the entire problem is, in a way, idealistic. In the 
first place, we have tacitly assumed in its statement, and in computing 
our results, that such proportions of product as I per cent, 2 per cent, 
etc., might be bad, but not 1.5 per cent or other fractional percentages. 
Obviously, this is not the case: for out of the 1200 screws in a box 
any multiple of 7s per cent is a possibility; and in a more general 
type of problem the variable might be capable of taking almost any 
values. We must therefore interpret O per cent as including all 
cases for which the actual percentage is less than 3 per cent; I per cent 
as including all other cases less than 1.5 per cent, and soon. Itisa 
matter of experience that such grouping of data frequently causes 
so little error that the added cost of more complete computation is 
not warranted. Hence, on the whole, the computation is probably 
as good as any practical situation of its kind is likely ever to warrant. 


PROBLEMS 


1. The following has been given as an example of the incorrect 
use of Bayes’ Theorem when applied to Example 34. (It deais with 
the chance of getting unlike coins instead of like ones.) 


“‘ There is a 3 chance that the coin first seen shall be gold. When 
gold has been seen, we know that we have chosen one of the first two 
boxes, but we do not know which. They are equally likely, hence 
the chance for a gold coin followed by silver is 4. There is an equal 
chance for a silver coin followed by gold. Hence the total chance 
ce ae 

What is wrong with this argument? 


132 PROBABILITY AND ITS ENGINEERING USES 


2. The same author gives the following as a correct solution: 


“If a gold coin has been seen, the @ priori chance for the first 
or second box is 4, but whereas the first has a chance 1 of showing a 
gold coin the first time, the second has only a chance % of doing so. 
The probability that the gold coin is in the second box is 


La 
ae Sai 
pe 
pith 
and there is a similar probability for a silver coin.” 
This argument is also wrong. Explain why. 


3. A box has been filled as in Example 35. We are told, and 
have complete faith in the information, that 50 balls have been drawn 
and have given 10 white and 40 black. We, however, make an 
independent test by drawing 5 balls, all of which turn out white. 
What effect has this upon the probabilities of various proportions 
of the two colors? 


4. If the proportions of Example 35 turned out to be half and half, 
how would the probabilities be affected? 


5. Inthe “bad penny” problem of § 52, suppose the first six throws 
show a run of five heads followed by a tail. What happens to the 
probabilities? 


6. Suppose instead that 600 throws showed a run of 599 heads 
followed by a tail. Would the situation be any different? 

(More than a simple “ yes”’ or “no” is wanted to this problem. 
as as clearly as you can your reactions to it as a problem in. 
logic. | 


REFERENCES FOR OuTSIDE READING 


. Coo.ince: Probability, Chapter VI. 

. ARNE FisHer: Mathematical Theory of Probabilities, pp. 54-67. 
. Venn: Logic of Chance, pp. 179-202, 249-264. ; 
. Czuser: Wahrscheinlichkeitsrechnung, pp. 189-210. 

. Bacuerier: Calcul des Probabilités, Chapter XXIII. 

. Laptace: Théorie Analytique des Probabilités, Chapter VI. 

- Motina and Crowe i: Deviation of Random Samples from 
Average Conditions and Significance to Traffic Men, Bell System 
Technical Journal, Vol. 3 (1924), pp. 88-99. 


IANA WO ND 


CHAPTER VI 
DistripuTIon Functions anp Continuous VARIABLES 


§ 55. The Random Choice of a Point on a Line Segment 


So far in our study we have dealt only with numbers and 
events which were essentially discrete. For instance, if a penny 
is tossed, two discrete events—heads and tails—are possible. 
In the nature of things there can be no intermediate state of 
affairs. Or if the penny is tossed repeatedly we may get 
various numbers of heads: but always integral numbers, never 
such a result as 1.732 heads. 

There are, however, many sets of events which do not fall 
into such discrete catagories, although they are proper subject 
matter for a mathematical study of probability. For example, 
in the production of lamps the ideal situation would be to have 
them all of exactly the same resistance. The factory process 
aims at that, but certainly does not accomplish it. Neither is 
it true that a certain discrete set of resistances may occur, and 
that others are totally impossible. Instead, resistance can vary 
continuously. 

This difficulty presents itself immediately: if there are not 
discrete events, there cannot be groups and sub-groups suitable 
for the application of our definition of probability. It would 
seem, therefore, that some entirely new definition would be 
necessary. This is not quite true, however, for the difficulty 
really resides in the notion of a continuous variable rather than 
in the definition of probability, and may be removed by exactly 
the same sort of argument as is used in defining irrational 
numbers. 

133 


134 PROBABILITY AND ITS ENGINEERING USES 


It is the principal purpose of the next few sections to make 
this clear. As the first step in this direction we begin with an 
example of so special a form that it may appear at first sight 
to have little to do with the fundamental problem, but from it, 
by successive generalizations, we shall be able to obtain what 
we seek. 


Example 37.—The Random Choice of a Point on a Line Segment: 
The perimeter of a well-balanced wheel is of unit length, and carrtes a 
uniform scale, such as the scale of a yard stick. A pointer of negligible 
thickness is set up opposite this wheel, and the wheel is spun. When 
it comes to rest the pointer indicates some number on the scale. What 
is the chance that it lies in the interval between two numbers a and b?4 


To start with, suppose this interval is 0.7 to 0.8. We know 
that the ten segments marked off by the points 0.0, 0.1, 
0.2,...,0.9 are all of equal length._.There are therefore ten 
equally likely and mutually exclusive events corresponding 
to the ten segments within which the pointer may possi- 
bly rest. Of these only one belongs to the desired sub- 
group. Here our original definition applies, and we find 
that the probability of the pointer resting between 0.7 and 0.8 
is 415. 

If we modify the requirements of the problem by demanding 
the probability that the pointer come to rest between 0.70 
and 0.71, we are able to divide up the entire perimeter of the 
wheel into 100 equally likely divisions; and we conclude at 
once that the desired probability is =45. 

To take a somewhat more general case, let us call the 
length of the interval -a =x. If is a rational number— 
that is, if it can be represented as the quotient of two integers 


n/m—it is possible to start from a and lay off the perimeter. 


of the wheel into m equal divisions, of which exactly 7 are 


It may avoid certain logical difficulties to regard this interval as containing a 
but not 4. That is, if the pointer rested on a it would be said to be in the interval, 
but ifit rested on 4 to be without it. The only purpose of such a convention is so to 
arrange matters that the sum of the intervals from a to 4 and from 4 to ¢ is just exactly 
the interval from 2 toc. If we made any other convention, we would either include 
one point twice, or omit it altogether. 


§ 55. THE RANDOM CHOICE OF A POINT 135 


included in the interval from a4 to 4. Obviously the probability 
of the pointer coming~to rest within this interval is just 
n/m =x: that is, it is the length of the interval. Hence, 
whenever the length of the interval is a rational number, the 
probability of coming to rest in it is equal to its length. 

But what if the interval is not representable as the quotient 
of two integers? In this case our fundamental definition of 
the measure of probability breaks down; for it is impossible 
to find any method of division which divides both the perimeter 
of the wheel and the segment exactly. 

Since this difficulty arises only when the number x is 
irrational, it is natural to go for relief to the branch of mathe- 
matics which deals with the nature of irrational numbers. 
When we do this, we find an even more fundamental difficulty: 
an irrational number cannot be written in our ordinary number 
system at all, and requires a totally different set of ideas for 
its definition. We get our first insight into the relation which 
these irrational numbers bear to the rest of our number system 
—and at the same time an indication of how to overcome our 
present difficulty — by considering how we deal with them in 
practical life. 2 

Suppose we choose 1/4/2 as our illustration. We ordi- 
narily write it as 0.7 or 0.707, or 0.7071; understanding in 
each case that what we have written is the wearest tenth, or 
thousandth, or ten-thousandth. We could easily frame the 


1Some rational numbers cannot be written as “‘ decimal fractions”? because the 
base of our number system is 10. The fraction 4 is of this class, for it leads to the 
“ repeating decimal” 0.333... 3 but if the basis of our number system were a mul- 
tiple of 3 this difficulty would disappear. Thus, if the base were 12 the fraction 
4. would be represented by the decimal 0.4, for 0.4 would then mean a'z instead of 745. 

The difficulty which we have above is not of this type, but is a fundamental fact 
in the logic of number and persists whatever base may be chosen. It can be shown 
that in all cases where the difficulty is due to the choice of the base 10, the succession 
of digits which makes up the decimal ultimately begins to repeat itself; that is, that 
all rational numbers can be represented by either “repeating” or “ terminating 
decimals.” Irrational numbers, on the other hand, not only cannot be represented 
by terminating decimals nor by the quotient of two integers: they cannot even be 
represented by repeating decimals. 


136 PROBABILITY AND ITS ENGINEERING USES 


same idea in the form of a set of inequalities if we so desired. 
For example: 


I 
OS aaa 


a/2 


I 
O72 = ='0.8,; 


a/2 


I 
0.70 < —= <0.7!I, 


a/2 


O.707 = FH < 0.708. 

Such a set of inequalities could be extended as far as we pleased, 
This is not only a practical expedient, however. Logically 
also, it leads to important consequences; for it shows that we 
can set up a sequence of rational numbers, 0, 0.7, 0.70, 
0.707, ..., approaching 1/4/2 as a limit, though each number 
is known to be less than 1/4/2; and another sequence of 
rational numbers, I, 0.8, 0.71, 0.708,..., which likewise 
_ approaches the limit 1/ 472 2, though each of its terms is greater 
than 1/4/2. The same is true of any other irrational number: 
m, for example, is approached by the sequence 3, 3.1, 3.14, 
3-141, ..., every term of which is less than z; and also by the 
sequence 4, 3.2, 3.15, 3-142,..., every term of which exceeds 
a. We conclude, therefore, that every irrational number is the 
limit of at least two sequences of rational numbers, one Gp Proce 
it from below, the other from above. 
Let us now return to the discussion of our revolving wheel. 
If x is irrational, we know that there must be a sequence of 
rational numbers 
Ny ng ng 
m, ms m3 aR J 

all Jess than x, but approaching it as a limit; and another 
sequence 

EGAN ERVIN ES 

My” M2” M;’ 


§ 55. THE RANDOM CHOICE OF A POINT 137 


all greater than x, but approaching it from above. Suppose 
that we locate on the perimeter of the wheel the four points 


a4,at+tn/m atx, a+N./M; 


the subscript 7 corresponding to 
some term in the sequence. 

We know that the probability a+n;| mj 
of the pointer resting in the interval a+ Ni/ My 


a to a+m/m (see Fig. 11) is 
equal to m/m,. But it cannot rest 
in this interval without also being 
in the interval between a anda+x.. 
Therefore, P(x) is at least as great as 
n/m, P(x) being the desired probability. Similarly, whenever 
the pointer is within the interval (a, a + x) it is also within 
the interval (4, a+ Ni/M,). Hence P(x) cannot exceed 
N/M. That is: 


BiGy rr. 


This inequality is true for any value of 7. As # gets larger 
and larger, however, the two quantities m/m and M./M 
converge to the same limit «, and as P(x) constantly lies 
between them, no matter how close to x they may come, it 
follows of necessity that P(*) = x. 


The probability of the pointer lying between a and b is equal 
to the length of the interval regardless of whether that length is 
rational or not. 


We have required that the perimeter of the wheel be of 
unit length. This, however, instead of defining the size of 
the wheel, merely defines the unit in terms of which all lengths 
are measured. The theorem only states, therefore, that the 
probability of lying within any interval is equal to the length 
of that interval measured in terms of the perimeter of the wheel 
as a unit of measure. As the probability would be the same 
no matter what unit of length was used, we conclude that: 


138 PROBABILITY AND ITS ENGINEERING USES 


The probability of lying within an interval of length x upon 
a wheel the circumference of which is L, is x/L. 


This is, in a sense, an extension upon the definition of 
“the measure of probability,” for that definition cannot be 
applied to this problem at all. It is made necessary by the 
fact that ‘‘length”’ is a continuous variable. But the exten- 
sion is a theorem, rather than a new axiom, for it is a logical 
consequence of the accepted relationship of irrational to 
rational numbers. 

Finally, we note that the revolving wheel of our example 
serves no other purpose than to assure that the chosen point is 
equally as likely to lie within one interval as another. The 
same probability would have been obtained with any other 
mechanism for which equal intervals were equally likely. 
We can therefore take as the final form of the theorem at which 
we have arrived: 


If a point is placed upon a line segment of length Lin such a 
way that equal intervals are equally likely to contain tt, the chance 
that it lies within an interval of length « is x/L. 


Such a point will be said to be placed “‘ at random ”’ on the 
line. 


§ 56. 4 Paradox Associated with the Random Choice of a Point 
on a Line Segment 


There is a curious paradox associated with this matter of choosing 
a point on a line segment which is of interest because it throws some 
light upon—or at least calls attention to—the meaning of zero 
probability. , 

The paradox arises in connection with the question, What is the 
probability that the point chosen is the midpoint of the line L? 
If we construct an interval of length « about this midpoint, we know 
that the chance of the point lying within this interval is x/Z. It 
is possible, however, for the point to lie within the interval and 
still not coincide with the midpoint: we conclude therefore that the 
desired probability is less than x/Z. This must be true no matter 
how small the interval becomes, which is only possible provided the 
desired probability is zero. That is, the probability that a point placed 
at random upon a line bisects that line is zero. ‘ 


§ 56. A PARADOX 139 

This argument has been carried out for the midpoint of the line L. 
It is obvious, however, that.it could have been carried out equally 
well for any other point.'” We therefore conclude that she probability 
that a point placed at random upon the line L coincides with any pre- 
assigned point, 1s zero. 

However, the random point must have fallen somewhere on the 
line, that is, it must have fallen upon some point or other. But 
according to the statement in the last paragraph the a priori prob- 
ability that it would fall where it did was zero. If, then, zero prob- 
ability means that the event cannot occur, the random point has 
done the impossible, for it cannot be where it is, and yet it most 
assuredly is there. 

The paradox is only one of many associated with the concept of 
infinitely large numbers, or of limiting processes in general.2 To 
understand it thoroughly we must go back to our original definition 
of probability, according to which we deal with a group of m equally 
likely events, and ask for the probability that one of a sub-group z 
occurs. The measure of the probability is the ratio z/m. So long 
as m is fixed this ratio can only vanish provided 7 is zero; that is, 
provided none of the possible events is included in our sub-group. 

In such cases, zero probability means impossibility. If, however, 
by some means we keep the number z constant and increase m 
indefinitely we can cause the probability of success to be as small 
as we please. For example, if there are ten balls in a box, 2 of which 
are red and 10 — # white, the chance that a ball drawn from this 
box is red is 7/10. It can only be zero provided there are no red 
balls in the box, in which case the drawing of a red ball is obviously 
impossible. The same is true if the number of balls is a hundred, 
a thousand, or a million. But if, having started with ten balls in 
the box, we add larger and larger quantities of white balls, the 
chance of drawing a red ball from the box obviously becomes smaller 
and smaller, and can in fact be made as small as we please by adding 
enough white balls. Thus, if there was originally only one red ball, 
so that the original probability was 1/10, we are able to make this 
probability less than 1/100 by adding 100 white balls (it is then 


1 We are prone to feel — though our better judgment tells us it is not true — that 
it would be more unusual for the point to bisect the interval than to fall on some 
other point that “had nothing peculiar about it.” Symmetrical effects always 
impress us unduly. There is a case on record where each of four bridge players drew a 
complete suit. “ Most remarkable,” we say: yet actually no less probable than the 
last hand you held. Each had a probability of 4.48-107?8. The remarkable thing 
about it was its symmetry, not its rarity. 


21t must be remembered that infinity is not a number, but a limit, 


140 PROBABILITY AND ITS ENGINEERING USES 


1/110). If we are asked to make it less than one in a million we 
can do it by adding a million white balls. In general, if we are 
asked to make it less than any number e we can do this by adding m 
balls, where m > 1/e. So long as m is finite the resulting prob- 
ability is also finite and we have no dilemma. We are merely 
assured of the fact that the probability is exceedingly small. But 
as m beconies infinite the probability in question becomes zero. 
Logically, therefore, zero probability means impossibility only when 
the group of events is finite in extent; when the group of events is 
infinite it means merely that the sub-group-of favorable events is 
but an infinitesimal part of the whole. 

There is a certain sense in which theory and practice diverge at 
this point, for many things which are logically possible are practical 
impossibilities. Looked at from a practical standpoint, zero prob- 
ability always means impossibility. With an infinity of balls the 
red ball might conceivably be drawn but the drawing of it would be a 
miracle rather than a “ practical possibility.” 2 

Now the placing of a point upon a line is in exactly this class. 
We could not accurately bisect the line if-we tried our utmost to do 
it, much less if we cut it at random; and yet the bisecting of the 
line is not a logical impossibility. 


§ 57. Extension of the Significance of the Preceding Paragraphs 


So far in this chapter we have talked continually about 
“placing a point at random” on a line segment. In the 
first instance we thought of this as being done by means of a 
machine of a special type. Then we made our concept some- 
what more general by dropping the conception of the machine 
entirely, and talking merely about the placing of the point, 
regardless of the mechanism employed. It has now become 
desirable to recognize: the fact that the arguments‘ which we 
have been carrying out have an even more general significance 
than this, and that the talk about “ points” and “line seg- 
ments” has been merely a form of expression which recom-. 

1 It is unfortunate, in a way, that mathematics has no separate notation for num- 
bers reached through limiting processes. Ifit had, we could say that zero (the number) 


as a probability means the thing is impossible; while zero (the limit) must be inter- 
preted in terms of the limiting process which gives rise to it. 


* There are propositions which are Jogically as well as practically impossible. We 
usually call them absurdities. For example: “x is less than a number which is less 
than x’; or, “He is his mother’s father,” 


§ 58. CONTINUOUS VARIABLES 141 


mended itself by making it somewhat easier to visualize the 
steps of the argument. ~~ 

To place a point upon a line which carries a scale is equiva- 
lent to choosing a number. Conversely, whenever any quan- 
tity whatever is measured, its magnitude can be plotted, 
thereby determining a point. Hence /ocating points on lines 
and measuring quantities are interchangeable ideas: whatever 
is true of one of them is true mutatis mutandis of the other also. 
The two fundamental results which have so far been obtained 
are therefore capable of being framed as follows: 


If by any process whatever a number is obtained concerning 
which two things are known: (2) that it cannot be less than x; 
nor greater than x2; (2) that it occurs at random between these 
limits; then the probability that it lies between a and 6 is 
(6 — a)/(x2 — 1), provided, of course, that both a and b lie within 
the interval in question. 


The probability that the chosen number is equal to any pre- 
assigned number x 1s Zero. 


In the sections which follow we shall drop our mechanistic 
ideas as to how the numbers are chosen and shall speak in these 
more general terms instead. 


§ 58. Distribution Functions for Continuous Variables 


Just as “ equally likely events,” in spite of their theoretical 
importance in giving us a starting point for the discussion of 
discrete groups of events, are comparatively rare in practical 
studies, so “‘ randomness ”’ (or, if we prefer the phrase, “ equally 
likely intervals ’’) is also of greater theoretical than practical 
importance. Most variables which arise in engineering have 
certain preferred ranges, and other ranges which are exceed- 
ingly rare. We have mentioned the resistance of a lamp 
filament as a quantity which cannot be forecast with absolute 
certainty; yet if the ideal at which production aims 1s 300 
ohms, it is much more likely that a particular lamp will be 


142 PROBABILITY AND ITS ENGINEERING USES 


found to lie within the ten-ohm range between 300 and 310 
than in the equal range between 390 and 400. . 

Such a variable is best thought of in connection with its 
distribution curve—that is, a curve so constructed that the 
area under it, between the ordinates at « = a@ and x = 4, 
represents the chance of the variable x lying within these 
limits. Suppose Fig. 12 is such a curve. It follows at once 
that the total area under the curve, from — o to + oo, must 
be unity, for x is certain to have a value between these limits. 
Suppose we let this curve have the equation y = p(x), and 
suppose we denote by p(> a) the chance of x exceeding a, 


0 ah x 


Fic. 12. 


and by p(< a) the chance of it being less than a. We see 
immediately that these three functions are related as follows: 


p> a) =f pls) as, 


p< a) =f p(x) dx. 


One gives the area to the right of « = a, the other the area 
to the left. In general, the probability of a value of * between 


a and bis 
i) 
fi Dp (r)-dx., 


Suppose, now, that 4 and a are very nearly equal: say, 
b=a-+da. If the curve y = p(x) is continuous in the 
neighborhood of « = a, as it usually is in practice, the figure 
bounded by the x-axis, the curve, and the two ordinates at a 


§ 59. NON-RANDOM DISTRIBUTIONS 143 
and a + da does not differ much from a rectangle, and hence 
its area does not differ much from p(a) da. The difference, in 
fact, is an infinitesimal of order higher than the first in da, 
which may usually be ignored. Hence, so long as we deal with 
very narrow ranges, it is usually quite satisfactory to regard 
p(@) da as the probability of a value occurring within such a 
range.! 

As illustrations of distribution functions for variables 
which are not distributed at random, we give two simple 
examples. One is a highly artificial case in which the prob- 
abilities can be calculated; the other is of the much more 
common type where they must be inferred as well as may be 
from the results of a long series of observations. 


§ 59. 4 Variable which is Not Distributed at Random 


ExampPLe 38.—The circumference of the wheel of Example 37 
carries a logarithmic scale numbered from 1 to 10. What is the dis- 
tribution curve for the choice of numbers on this scale? 


Let us denote the numbers appearing on the scale by y, 
and their distances from the number 1 by x. Then x = log y. 
This is the definition of a logarithmic scale. 

Now choose two zumbers a and 6. The distance between 
them is log 2 — log a; and the circumference of the wheel is 
log 10 — log 1. Hence the probability of the pointer indicat- 
ing a number between a and dis 
(log 6 — log a)/( log 10 — log). |») 

If we use logarithms to the 
base 10, log 10 = 1 andlog 1 =o, 
so this probability is simply 
log — log a. Our distribution ? ae ey 
curve, Fig. 13, must therefore needs 
be of such a nature that for any interval (4, a) the area 


1]t is also not unusual to use the phrase “ the probability that « takes the value a 
is p(a),” meaning thereby that the chance of « lying between a and a + da differs 
from p(a) da by an infinitesimal of the second order at least in da; or, in other words, 
that the ordinate to the distribution curve at a has the length p(a). 


144 PROBABILITY AND ITS ENGINEERING USES 


is given by this formula. But obviously this means that 


6 
{20) dy = log 6 — log a. 


Differentiating this with respect to 4 we obtain } 


log e 
pd) =, 
or, if we prefer to write it that way, 
log e 
7) = ae 


as the definition of our curve. 

This case can be solved directly from our definition of 
probability. Let us now take one upon which we can get 
no grip theoretically. 


§ 60. Distribution Functions Derived Empirically 


EXAMPLE 39.—Construct a distribution curve of length of life: in 
other words a curve the area under any portion of which ts a child’s 
chance at birth of dying within the corresponding range of ages. 


It is certainly true that this problem cannot be solved from 
a purely theoretical standpoint. Equal age ranges are known 
not to be equally likely, and most of the other facts of mor- 


1Remember that these are logarithms with base 10. To renew acquaintance 
with the formula of differentiation in such cases we may point out that, if x is any 
number whatever, - 
x = ¢l0Be Z, 
Also 
¢ = 10)810¢, 
Substituting the second in the first 
x = 10108. 2) (logi9 €) , 
But by definition 


x = 10/8107, 
Comparing these, it follows that — 

logio oe logio é loge Xe 
Differentiating 


d ] 
2 vogie x = 82S, 


§60. EMPIRICAL DISTRIBUTION FUNCTIONS 145 


tality are tempered by experience in the same way. The 
best we can do is to resort to vital statistics and find what 
proportion of the population has been observed to die within 
various age limits. In a sense each life is an “ independent 
trial,” and we can conclude from Bernoulli’s Theorem that, 
if the cases are numerous enough, the proportion of deaths 
between any two age limits is not likely to deviate much from 
the probability of dying in 
that range. 

The curve of Figure 14 
has been plotted in this 
manner from certain Ger- 
man data given by Czuber 
in his Wahrscheinlichkeits- 
rechnung. The general char- 
acter of the curve would 
be much the same for data 
taken from almostany civil- 
ized country; though it is 
obvious that different con- 
ditions of sanitation, and 
particularly different customs in the handling of children, 
would affect it somewhat. 

The curve is, therefore, of the nature of a “ conditional 
probability” curve: it gives a child’s chance of dying at a 
certain age if he is born at a particular place. It is also 
conditional in another sense which is of more consequence to 
the engineer. The conditions of life are not static: medical 
efficiency, for example, is increasing. Hence the curve depicts 
a child’s chances of life if born at a given time. In the very 
nature of things no child can be born at that time: for the 
existence of the data implies that the time is past. To-day 
a child’s chances are different, probably in some such way 
as is indicated by the dotted line, though such a curve can be 
based on no more substantial foundation than the estimation 
of a “trend ””—or rate at which the probabilities are changing. 

The important point to notice, however, is that the data 


Probability of Death 


Fic. 14.—A Typicat Lire Curve. 


‘ 


146 PROBABILITY AND ITS ENGINEERING USES 


is unsatisfactory because THE PROBABILITIES change with time. 
When this is of true, conditions are said to be in “ statistical 
equilibrium.” 


Present probability can only be inferred from data already 
collected when the system under consideration is in statistical 
equilibrium. 


Some conception of the significance of the uncertainties 
introduced into life insurance by this factor can be gotten by 
considering the position in which the insurance companies 
would find themselves if the “trend’”’ were toward shorter 
rather than longer life. They would then face almost certain 
loss if they based their rates upon available statistics, and 
would be forced to estimate as best they could the factor by 
which present conditions differed from past, with obvious evil 
consequences if they made a wrong guess. As it is they face 
almost certain gain; which is quite satisfactory to the com- 
panies, and not very serious to the policy-holder if his com- 
pany has his interests at heart. 

Now, we have no interest in life insurance as such. But 
the same conditions are constantly met in engineering. Take, 
for instance, Example 36, in which certain empirical data were 
made the foundation for a solution of the problem. Perhaps, 
in the case in question, such data could be thoroughly relied 
upon, for screw-making is, I suppose, a pretty stable process, 
and not subject to rapid improvement. Suppose, however, 
that the same sort of argument were attempted in the case 
of some sort of radio accessory: about the time we began to 
feel some confidence in our data someone would “improve” 
the process, and we would be confronted with a set of condi- 
tions to which our data no longer applied exactly— perhaps not’ 
at all. 

In other words, manufacturing conditions, like life, are not 
in statistical equilibrum. Instead, the probabilities are func- 
tions of time, and those functions are generally unknown. 


§ 61. DISTRIBUTIONS IN MANY VARIABLES 147 


§ 61. Distribution Functions in Many Variables 


To the conception of “compound events” in the case of 
discrete variables correspond “ functions of several variables ” 
when the variables are continuous. As,an example, suppose 
we are stamping out metal discs on a punch press, to be used in 
operating a slot machine. If the slot machine is so constructed 
as to reject a coin the weight of which deviates too much from 
standard, we will naturally be interested in the probable 
variation in weight of our product. The weight, however, 
depends principally upon two factors: the thickness, ¢, of the 
particular portion of the sheet from which the disc comes, and 
its area, a. Both of these are 
subject to variation, and both are 
obviously continuous variables. 

Suppose we represent @ and a_a+ da 
tas Cartesian coordinates, as in 
Fig. 15, and build up a “ distri- 
bution surface”; thatis,asurface ++ % 
such that the volume under any 
portion of it is equal to the Fic. 15. 
probability of the point (a, ¢) 
lying under that portion. Call the height of such a surface 
p(a,t). Then the following things are immediately obvious: 


1. The probability of a pair of values within the ranges 
(a,a + da), (t,¢+ dt) is p(a,t) da dt, except for an infinitesimal 
of higher order in both da and dt. 


2. The probability of a value of a in the range (4, a + da) 
is equal to the volume of the vertical slab the base of which is 
bounded by the lines a and a+ da. If we call it p(a) da, 
it is related to p(a, ¢) by the law! 


pla) da = da pla, t) dt. 


1“ Except for a differential of higher order in da.” The general idea should by 
now be clear enough to allow the omission of such statements in the future. Dif- 
ferential notation implies a limiting process for complete accuracy. 


. 


148 PROBABILITY AND ITS ENGINEERING USES 


3. The probability of a value of ¢ between ¢ and ¢ + dt 
is represented by a similar slab. The corresponding formula is 


p(t) = f pla, #) da. (56) 


It is understood, of course, that the integral extends over every 
possible value of a. 


4. We can, if we wish, think of the occurrence of a value of 
a in the slab between a and a+ da as “event 4,” and the 
occurrence of a value of ¢ between ¢ and ¢ + dt as “ event B.” 
To these events the argument of § 47 applies directly, where- 
fore we have 


P(AB) = P(A) Pa(B). (20) 


But we already know that P(4) differs from p(a) da by 
an infinitesimal of higher order in da, which we may formulate 
in the form of an equation by writing 


P(A) = [p@ + ¢ da, 


it being understood that « vanishes with da. Similarly we 
know that P(4B) differs from p(a, 4) da dt by an infinitesimal 
of higher order than da dt. Hence 


P(AB) = [p(a,t) + | da dt, 


where ¢’ also vanishes. 
Introducing these expressions into (20) and making a few 
obvious rearrangements we obtain 


P.4(B) c plat) + é 
at pla) +t.e- 


Now let da and dt approach zero. The right-hand side 
of the equation approaches the limit p(a, 4)/p(a), which we 
may call p,(¢) if we like. Therefore the left-hand side must 
also approach this same limit. This means, of course, that 
pa(t) dt differs from Pa(B) by an infinitesimal of higher order 
than dt. We are therefore justified in calling p,(¢) dt‘ the 
conditional probability of the point lying between ¢ and 


§ 61. DISTRIBUTIONS IN MANY VARIABLES 149 


t + dt, if it is known to lie between a and a + da”; meaning, 
as in all such differential expressions, that a differential of 
higher order is ignored. ; 

However, the very definition of p,(¢) shows that it satis- 
fies the relation 


pla, t) = pla) pl), GD 


which is the exact counterpart of (20) in the case of continuous 
variables. 
If we substitute (57) in (56) we get 


p(t) = { pl@) pad) da, (58) 


which is an exact counterpart of (21). 


5. Finally, if we had started from the slab for which ¢ lies 
between ¢ and ¢ + dt we would have concluded that 


p(a, t) = p(t) pila); 


and thereafter, by comparing this equation with (57) and 
making certain obvious transformations, that 


pila) = sees (59) 
or from (58) 
pla) pot) 


6 
ff p(a) p.lt) da ue 


pila) = 


These are extensions of (53) and (54). That is, they are 
Bayes’ Theorem for the case of continuous variables. 
Obviously, the significance of “ thickness”’ and “ area’ 
which we have attached to ¢ and a is unimportant: the formule 
are true in general. Nor are the ideas confined to the case of 
two variables only. Entirely similar formulz could be written 
for any number of variables. For instance, the velocity of a 
gas molecule may be defined in terms of its components in 
three coordinate directions x, y, z. There is, of course, a 


bd 


150 PROBABILITY AND ITS ENGINEERING USES 


certain probability of a molecule possessing this velocity.’ 
We denote it by p(w, v, w); u, v, w being the three components 
in question. 

Of course we mean by this, that the chance that all three 
components lie within the ranges (u, u + du), (v,v + de), 
(w, w+ dw) is p(u, v, w) du dv dw. The geometric picture, 
however, is no longer quite so simple as before, for it requires 
three dimensions to depict the variables u, v, w, and a fourth 
would be needed for p. As we cannot visualize four dimen- 
sional space we can only proceed by analogy; and in doing 
this it is simpler to speak of “integrating p throughout a 
volume ” than to try to visualize the fourth dimension. Thus, 
referring again to the case of two variables, (56) tells us that 
p(a) da is the result of “integrating p(a, 4) over the vertical 
strip bounded by the lines a and a+ da.” In this language 
we say that the probability of the point (w, v, w) lying within 
any closed volume is obtained by integrating p(u, v, w) over 
the volume in question. It is not necessary that this volume 
be bounded by coordinate planes. It is entirely unrestricted. 


§ 62. Some Examples of Change of Variable in Distribution 
Functions 


In studying gas molecules we are not always interested in 
their velocity. We may want to discuss speed, direction, 
momentum, energy—any number of things. But as all these 
are determined by the velocity,? we ought to be able to find 
the probability of any of them in terms of p(z, v, w). This is 
really the case. . 

For example, suppose we are required to find the chance 


1 We are using the term “ velocity ” in its scientific sense of ‘‘ speed and direction,” 
not merely “speed.” Two bodies, one moving along the x-axis at the rate of one 
centimeter per second, the other at the same rate along the y-axis, have the same 
“speed” but different “‘ velocities.” To say that two bodies have the same “ veloc- 
ity ” implies that all three components are equal. 

In other words, velocity is a vector. 


2 And the mass, which is generally a constant. 


§ 62. CHANGE OF VARIABLE 151 


that a molecule has a speed less than s. This means that 
u, v, w must have suchvalues that u? + v2 + w? < 52; in 
other words, that the point (w, v, w) must lie inside the sphere 
of radius s about the origin. Thus 


peas) ={f plu, v, w) du dv dw, 


the limits of integration being the surface of the sphere. 

Or, in the two-dimen- 
sional case, suppose we 
want the probability that 
the weight w lies between 
wo and wy + dw. We have 
already noted that! w = ¢a. 
Hence ¢ and @ must lie in- 


side the strip bounded by 


la = Wo, 


ta = wo + dw. Fic. 16. 


These are the hyperbolas shown in Fig. 16. Hence 


toot aro 


plw) dw = (dof ‘ pla, t) dt. 


a 


We are supposing that dw is very small. If so, the limits 
of integration upon ¢ are very close together, except when a 
itself is exceedingly small. An exceedingly small a, however, 
means (in our problem) a disc of negligible area. Obviously 
such discs are so unlikely that we shall commit no practical 
error in asserting that they never occur. Let us say that discs 
of area less than ao never occur. Then the ¢limits are always 
very close together, if dw is small. . 

Further, if p(a, 4) is continuous, its value will not differ 


1 Usually there would be a density factor. We may suppose the unit of weight to 
be so chosen, however, that this factor is unity. 


152 PROBABILITY AND ITS ENGINEERING USES 


much from p(4, wo/a) over the entire range of integration. 
Hence 


wot+dw 


} pla, at = p( > 


a 


p(wo) dw = do { pla, a ue 


The approximation is within the second order of the infinites- 
imal dw, and since our interpretation of p(wo) implies either 
a limiting process or an error of that magnitude, it is allowable 
to replace + by =. Hence, finally, dropping the subscript 
zero which really means nothing to us, 


pw = (a 2)% | (61) 


We shall wish to refer to this result again, and it will prove 
wise to call attention at this time to a false argument which 
we might thoughtlessly have carried out. It is this: to demand 
that w take the value wo is equivalent to demanding that ¢ 
take the value wo/a. But by (56) this is 


flat) 


This, however, is of identical with the correct answer (61). 
The reader will have no difficulty in locating the error. 
Both these illustrations may be classified as “change of 
variable’: we have shifted our attention from u, », w to 5; 
or from ¢, a to w. In each case we have at the same time — 
reduced the number of variables, though we need not have 
done so. We might, if we had desired, have asked for the 
probability of a weight w and a radius r: two new variables 
rather than one. Such changes of variable are obviously to be 
expected as our attention shifts from one phase of a problem 
to another; and as we have seen that a simple substitution of 


and 


§ 63. CHANGE OF VARIABLE 153 
new variables for old in the equation of p does not give correct 
results, we must investigate the matter in detail. 


§ 63. Change of Variable in Distribution Functions 


Let us consider first a case in which only one variable is 
involved, and think in terms of Examples 37 and 38. In solv- 
ing 38 we really made use of 
our knowledge of the solution 
of 37: that is, we knew p(x) 
_and the relation x = log y, 
and from it found p(y). 

Let us suppose that p(x) 
is represented by the curve 
of Fig. 17. In Example 38, 
of course, it was a straight 
line parallel to the w-axis; Fic. 17. 
but as we wish to make sure 
that our final result is perfectly general, we choose a better 
picture of “any function.” Suppose also that p(y) has 
somehow been found, and is represented by the curve of 
Fig. 18. Finally, sup- 
pose the relation be- 
tween y and is known 
in the form of an equa- 
tion 


y =f(*), (62) 


just as in Example 38 

it was known in the 
form y = 10°. 

Fic. 18. The chance that x 

lies between two values 

xo and xo +x is represented by the shaded area in Fig. 17. 

But whenever « lies within this interval, y also must lie within 

fixed limits. It will, in fact, lie between yo = /(%o) and 

yo + dy = f(xo + dx); that is, within the shaded area of 


154 PROBABILITY AND ITS ENGINEERING USES 


Fig. 18. As neither event can happen unless the other does, 
these shaded areas must be equal. 

Now suppose that dx is very small. The same will also 
be true of dy.!_ Hence both shaded areas are nearly rectangu- 
lar. If, then, 4 and 4 denote height and breadth, 


hz by = hy Dy. (63) 


But the heights are by definition p(x) and p(y), while the 

breadths are dx and f(xo + dx) — f(xo), the latter of which 

is approximately ? /’(xo) dx. Hence, dropping the subscripts, 

which are no longer of value to us, we find the equation of the 

new distribution curve to be 

_ pix) : 

20) =F (64) 

Of course the y and x in these equations are corresponding 

values. In Example 38, for instance, the y is always 10° and 

conversely x is always log y. Hence for this special problem 
(64) becomes 


x = logy 

which reduces to the answer obtained in § 59 when we recall 
that p(x) = 1 and that the derivative of 10° is 10*/loge. 

Similarly, in (64) the proper x*may be obtained by solving 

(62). As it depends on y for its value, we may properly call 

it? ~ = f(y), and write (64) as 
()), 

oo (65) 

w= f—*(y) . 

This, then, is a general formula by means of which, when the 

distribution function for any variable « is known, together with 


1 Unless dy/dx, as determined from (62), is infinite at x = xo. 

2 Under'the conditions of the last note. 

* The expression for x in terms of y is called the “inverse” of the expression for 
y in terms of x, and is always denoted by the index — 1. Thus the inverse-of the 


y = f(x) = 107 in § 59 is x = f(y) = logy. 


§ 63. CHANGE OF VARIABLE 155 


the relationship between x and some other variable y, the 
distribution function for-y may be obtained. It says nothing 
whatever, except that the curves must be so defined that 
corresponding elements of area are equal. 

Similar rules may be set up for distribution functions of 
several variables. To illustrate how this is done, let us con- 
sider first the case of two variables x and y. To represent these 
requires a plane; any pair of values of x and y determine a 
point (x,y) somewhere in the plane, and the distribution 
function p(x, y) is defined as a function which, when inte- 
grated over a region dJ, gives the probability that the point 
(x, y) les within d4. If dd is small enough, p(x, y) is sen- 
sibly constant over the entire area, and the probability is 
given by ! p(x, y) dd. 

Now suppose we have two other variables ¢ and 7, related 
to x and y by means of the equations 


E = f(x, y)s 


66 
ae o(x, ¥), ( 


f 

corresponding to the equation y = f(x) in the case of one 
yariable. Suppose we choose another plane for the representa- 
/tion of these, and seek to determine their distribution function 
“ p(t 1). As the point (x, y) travels around the boundary of 
dA, the corresponding point (£7) will travel around some 
curve in its own plane. This curve will bound an area da, 
which we shall call the ‘‘ element of area corresponding to dd.” 
Whenever the point (x, y) lies in dd, (&, 7) lies in da. There- 
fore 


P(E 1) da = plx, 9) da. 67) 


If « and y are given in advance, the £ and » of this equation 
are to be found from (66). Conversely, if & and » are given, 


1In particular, if d4 is a rectangle bounded by horizontal and vertical lines its 
area may be called dx dy. The probability in question will then be p(x, y) dx dy. 
But as it is by no means necessary that the area be of such a shape, we prefer to keep 
the more general differential 74, 


156 PROBABILITY AND ITS ENGINEERING USES 


the x and y must be found in terms of & and 7 by solving 
(66) just as, in writing (65), y = f(x) had to be solved for x. 

As has already been said dd and da are corresponding 
elements of area. To get p(é, 7), therefore, we must mul- 
tiply p(x, y) by the quotient d4/da just as in (64) we mul- 
tiplied p(x) by the quotient dx/dy = 1/f'(x). 

To find this quotient we choose d/ as a rectangle bounded 
by the values «, » + dx, y and y -+ dy: (See Bigs 19.) As 
(x, y) travels along one of the lines which bound this rectangle 
(¢, 7) will trace a curve of some form. The element da, 
therefore, is bounded by four curved elements as shown in the 


xy ~plane Eq-plane 


*% x+dx wh EEL 74 
4 < Oy Af 
grdLar 
Fic. 19. 


figure. If dx and dy are very small these elements will be so 
nearly straight that the difference between the true da and the 
area of the rectilinear figure having the corners a, 4, c, d can 
be ignored. Hence it is sufficient for the purpose of our 
argument to locate the four points in the £&y-plane which 
correspond to the corners of d4. We may then assume that 
they are connected by straight lines. 

We consider first the point corresponding to (x, y); that 
is, to the lower left-hand corner of d4. Let us suppose that 
this is the point a of Fig. 19.1 Its coordinates are given by (66). 

Next we take (~ + dx yy) — the lower right-hand corner of 
dA. The corresponding corner 4 of da has the coordinates 


—— 


It might be any corner. It all depends on the signs of the derivatives of f and ¢- 


§ 63. CHANGE OF VARIABLE 157 


J(% +: dx, y) and (+ ax, y). If dx is small enough these 
may be replaced by ! 


S(s, y) + of oe and ¢(x, y) es — © dx, 


respectively. But ne y) is &, and $(x, y) is ; that is, these 
are the coordinates of the point a2. Hence ax and 2° dy are 
x ox 


the horizontal and vertical sides of the triangle ahd. 
Next we consider the upper left-hand corner (x, y + dy). 
In the &-plane this becomes a point of which the coordinates 


are easily found to be & + © ay and » + ow dy. It is repre- 
bi D, 


sented by c in Fig. Ig. 

We do not need to know the coordinates of the fourth 
point d, for if dx and dy are very small, da is very nearly a 
parallelogram, and therefore its area is satisfactorily deter- 
mined by the two sides ab and ac. The area is, in fact, equal 
to the product of the lengths ab and ac and the sine of the 
angle included between them. In terms of the angles @ and 
6, this rule gives 


da = ab-ac sin (#1 — 6) 
= (ab cos 0)(ac sin 6:) — (ab sin 6)(ac cos 6). 


However 

ah = abcos@ = at 

Ow 
bh = absin 6 = Chae 

Ow 

0, 
ac cos 6, = a ay, 

P O¢ 
6, = —dy; 
ac sin 0 A ly 


1 Partial differential notation is used because y remains constant along this side 
of our rectangle. 


158 PROBABILITY AND ITS ENGINEERING USES 


wherefore 


pe (2 O¢ af 28) a4 
ax oy oy ox 


Thus we have found the ratio of da todd. It serves our 
purpose best to write it in the form of a determinant: 


de Od ee 
ox 86” 
GPa OF 
ox oy 


As f and ¢ really mean the same thing as £ and 7 [we could, 
in fact, have written (66) in the form & = &(x, y), » = n(x, y), 
if we had wished], this determinant can just as well be written 


ole, 7) _ | O& 0k 

o(x, y) Ox ei soy 
on On 
Ox ay 


(68) 


Then, by (67), 

_ py) 
ac é, n) ‘ 
a(x, y) 


ple n) 


it being understood, of course, that the x’s and y’s on the 
right-hand side are to be expressed in terms of & and » by 
solving (66). 

If we were dealing with a problem in which there were three 
variables, x, y and z, we should be forced to talk about elements 
of volume instead of elements of area; and by an exactly 
similar line of argument we could determine the ratio of the 
volumes of corresponding elements in &f-space and xyz- 
space. We shall not go through the argument involved in this 
case as it is identical in principle with that used above. The 
result, however, is 


§ 63. CHANGE OF VARIABLE 159 


Ok: oer ok mee bs 
* lax Oy oe Af 
an an an 
Ox Oy Oz 
ar at at 
Bey Oy Lez 


the letters 4 and a being retained for corresponding elements 
in the xyz- and &nf-spaces. 

In general, no matter how many variables x, y, 2,..., w 
may be involved, when we change from one system of repre- 
sentation to another the new differential element da is defined 
in terms of the old element d/4 by the equation 


Of OF LHF OE 
da=|—- =—- = ... —!daA 
~~ lee fy Be ow |? (69) 
Bi eT 04 On 
Ox oy oz-— ow 
Of oho of acl 
SxetnON. “0EL Ow 
ow Ow Ow ow 
nn) gece ow 


or, symbolically, 


da = Ohh +: 1 ®) 14 
OLN es 25. + 25 w) 


Hence 

_ Plt Jy % ++ W) 
o(é, Ny Sys 2 ey w)” 
OC VS Beet a) 


pls NM) Sy see w) (70) 


Ol Fits soe oes @) . 


The determinant , in which the elements are all 
O(*, Ys % -- +> &) 


possible partial derivatives of the new variables with respect to the 


160 PROBABILITY AND ITS ENGINEERING USES 
old, is known as the “ Jacobian” of the transformation. Its geo- 
metrical significance is contained in the concept which has led us to 
it: the ratio of the differential elements in the two systems of 
coordinates. About it we may make three interesting observa- 
tions: 

In the first place, it is interesting to note that the simple case 
discussed at the beginning of the section, in which we had only one 
new variable y and one old variable x, falls in line with this more 
elaborate equation. As only one relation y = f(x) was required to 
define the new variable, only one row appears in the determinant (69); 
and since there is only ove variable x in the original system, there can 
be only one derivative of f, and therefore only one co/umn in (69). 
The entire Jacobian therefore reduces to the single term df/dx, which 
is therefore the ratio of our old and new differential elements. This 
is, in fact, what we found it to be. 

In the second place, there is a perfect reciprocal relationship 
between our old and new: variables. Perhaps this is most easily 
realized by recalling that, once p(&, n, £,..., w) has been found, it isa 
“known distribution function,” and we could start with it and find 
p(*, ¥,%,~++5 W) by exactly the same argument as was used above. 
The only essential difference would be, that instead of differentiating 
the equations € = &(%, y,2,..., @), 9 = nw, 9,2, +-., W);.-., tO 
form the Jacobian, we should want equations defining wx, y,2z,..., w 
in terms of £,7,§,...,@. Obviously the result of this reciprocal 
relationship is 


. ie Ds inG ees ©) 
POSYS 2, wot = ee aa ee aay, (1) 


Cl eo ots Spneuere 5 en) 


Now the p’s mean exactly the same things in (70) and (71). Hence 
we obtain the following remarkable (though, considering its geo- 
metrical significance, entirely reasonable) property of Jacobians: that 


O(%, Vy % +225 W) _ ox ox Ox 
Oeste is ereiece) rok On Oe 


Oy oy oy 
OfarLOnn anee 


ow ow aw 
O& On 1 Ow 


§ 63. CHANGE OF VARIABLE 161 


is the reciprocal of 


(fm f---,4) _|0& O& at 
oO, y, Oy oe ae) On Oy - Ow 
oa Le eel 
ee ahey 


Qe de Be 
Ox Oy "ow 


Suppose, for instance, that « and y are related to £ and 7 by the 
equations 


== .£ COS Ny 
y = &sin n. 
Then 
a(x, y) ; 
a(E r =l|cosy — &sinn| = & 
sin 7 Ecos 9 
But by solving for & and 7 we find 
B= Vx? + 9, 
7 = tan—1 2, ® 
S 
whence 
Og 1) _ x J fe Ee ge 
a(x, y) Var ty? Vert y?| Wat ty? §& Oley) 
535 2 até, ») 


we fy? ne? = y* 


Our third observation is, that, since the ratio of two volumes 
cannot be negative,! the determinant which occurs in (69) should 
always be positive. Whether it is or not, however, depends upon 
the order in which the variables are written down; for we can always 
change the sign of a determinant by interchanging two rows or two 
AEST rt) Sia aa a RE = eh ee 


1In other branches of mathematics, particularly in Analysis Situs, the sign of the 
Jacobian has an impoftant significance. As we are obviously dealing with absolute 
magnitudes, however, it is unnecessary to discuss the matter here. 


162 PROBABILITY AND ITS ENGINEERING USES 
columns. If, for example, we had happened to write the equations 
of the preceding paragraph in the order 


y = Esing, 
x = &€COS n, 


O(ys #) _ £. 
o(é, 7) 


It follows, therefore, that the sign with which our Jacobian 
appears is largely a matter of accident, and that in every case the 
positive sign is the correct one. 

As a further illustration of this method, let us reconsider the 
problem solved in § 62. In that problem we supposed the distribu- 
tion function to be known for the variables a and ¢, and desired a 
new distribution function for the single variable w, which was related 
to a and ¢ by the equation 


we would have found that 


w= fad. 


Let us first attempt to find a new distribution function in terms 
of the two new variables 
w = td, 


v= 4. 
Then by actual evaluation of the Jacobian, we find that 


a(w, 9) _ 
a(a, f) 


ee 


sy a2) 


Hence, disregarding the negative sign, we arrive at the new distribu- 
tion function ! 


P(w,v) = Be) 
ae 
t=w/o 


Finally from (56) we get a distribution function for the single 
variable w in the form 


p(w) a P(w, v) do 


“fA 
ely py ge , 


The capital P is used to avoid confusion with the last member of the equation. (| 


— — 


§ 64. MAXWELL’S EQUATION 163 


which is, except for the introduction of a new symbol for the variable 
of integration, the same as (61). 


§ 64. Derivation of a Distribution Function for the Velocities 
of Gas Molecules 


One of the important problems in Physics to which Prob- 
ability Theory was first applied, and virtually the only one 
in which the so-called Normal Law appears as a consequence 
of an argument which is even approximately sound physically, 
is that of the distribution of velocities within a perfect gas. 
Later on, in § 147, we shall give what is probably its most 
satisfactory proof. For the present we wish to use it prin- 
cipally as an illustration of the processes which we have been 
discussing in the present chapter, and shall therefore content 
ourselves with a “proof” that is far from exact. 

We suppose the gas to be composed of a large number of 
particles, similar or dissimilar as the case may be, but all in 
agitated motion. We further assume that the gas is restrained 
within a vessel which is not itself in motion. The question 
before us is, What is the chance of a particular particle having 
a specified velocity? 

Tostart with, we want to be sure that we know exactly what 
we mean by this question. Suppose we phrase it this way: 
A particular particle is tagged for purposes of identification, 
and an instant of observation is chosen without any advance 
information which influences the probability for which we 
seek. At that instant we note the velocity of the particle in 
question, both as to magnitude and direction. What is the 
chance that its components lie between uw and u + du, v and 
v + dv, and w and w + dw, respectively? 

We make these assumptions: 


(a) The probability of a given velocity is independent of 
the part of the vessel in which the particle may be. That is, 
p*(u, v, w) does not depend upon the coordinates x, y, an 


1 The asterisk is added to the symbol for probability in order to conform to the 
notation of Chapter XI. 


164 PROBABILITY AND ITS ENGINEERING USES 


Physically there are two common conditions which violate 
this assumption: when different parts of the gas are at dif- 
ferent temperatures; and when they are at different pressures 
as, for example, when it is flowing out of an orifice. 


(2) All directions of motion are equally likely. 
To this we cannot object. 


(c) The probability of a component w in the x-direction is 
independent of any knowledge we may have about the transverse 
components v and w. 


In this assumption lies the weakness of the argument; 
for it is certainly not obvious that it should be true, either 
from a mathematical or a physical point of view. Our “ proof” 
therefore will not be very convincing. However, we proceed 
with it. 

Let p*(u) be the probability of a component uw. By the 
assumption (c): 

p*(4, v, W) = p*(z) “p*(v) “p*(w), 
or 


log p*(u, v, w) = log p*(u) + log p*(v) + log p*(w). 


By assumption (4) the function p*(u, v, w) must depend 
on the combination u? + v? + w? only; for if we were to 
replace the three components of velocity by their expressions 
in terms of speed and direction, the directional variables 
would all disappear, leaving only the speed. Suppose, then, 
that we write for the moment 


log p*(u, v, w) = f(u2 + v2 + w?). 


We now make our final assumption: that this function f is 
of such a nature that it can be expanded in a series for suf- 
- ficiently small values of the speed. We then get 


Sf (4, 0, W) = ao + ai(u? + v? + w?) + ao(u?+ v2? + w2)?2?4 .., 
= log p*(u) + log p*(v) + log p*(w). (72) 


Obviously the third member of this equation is of such a form 
that no cross-products between u,v, w can be allowed. But 


§ 6s. MAXWELL’S EQUATION 165 


such cross-products arise from all the terms of the second mem- 


ber except ao + ai(4? + v? + w?): hence the coefficients 
a2, 43,...,must be zero. Thus 


log p*(u, v, w) = ao + ai(u? + v2? + w2), 


or 
p*(u, v w) = A eet ere® 
> . 


This equation contains two constants 4 and a but they 
are not both arbitrary because of the necessary relation 


{ if at p*(u,v, w) dudvdw = 1, 


which requires that 7°42 = — a,*. If, then, we call a, =— a 
we have as our final law ! 
34 
a —a(u2+ 024 w2) 
bat are vy w) = (4) e ay 5 . (73) 


This is Maxwell’s Equation. It contains only ove arbitrary 
constant a, and that can be shown to be completely determined 
by the temperature of the gas and the mass of the particle 
under consideration. 


§ 65. Some Instructive Illustrations; Change of Variable in 
Maxwell’s Equation 


As an illustration of the type of service which change of 
variable may render us, let us consider the following example: 


ExampLe 40.—The probability of a gas molecule having the velocity 
components u, v, w being given by (73), find the distribution law in terms 
of the variables s (speed), ¢ (co-latitude) and 0 (longitude). 


1]t is interesting historically to note that Maxwell first derived equation (73) 
by substantially this line of argument. He called attention to the fact, however, 
that there was no real justification for it, and afterwards attempted to improve his 
demonstration by a line of argument similar to the one given in § 144. I believe he 
was of the opinion that this latter proof was conclusive, and that it was some years 
before attention was called to the fact that it also implies an assumption for which 
there is little greater justification than there is for the assumption that the velocity 
components are independent. 


166 PROBABILITY AND ITS ENGINEERING USES 


The coordinate system specified in this example is no 
other than the one usually known as “ spherical coordinates.” 
It is shown in Fig. 20. The relations between the two sets of 
variables are immediately seen to be 


u = S$ sin & COs 8, 


v = s sin ¢ sin 8, (74) 


W@ = 5C08 &. 


Knowing these equations we are 
prepared at once to transform (73) 
into our new variables s, ¢ and 06. 
It happens, of course, that these 
equations give the known variables 
in terms of the desired ones; but that is not a disadvantage, 
for as we have seen, the Jacobian of one set is the reciprocal 
of the Jacobian of the other. 

We obtain directly 


Fic. 20.—SPHERICAL COORDINATES. 


o(u, 0, w) : : 
= | sin ¢ cos @ 5 COS ¢ COS 8 —s sin ¢ sin 6 
a(s, $5 8) ae ; | 
sin ¢ sin 0 5 cos ¢ sin @ 5 sin @ cos 0 
cos ¢ -— ssin ¢ fe) | 
= 57 sin ¢; 


and therefore 
p*(s, o, 0) = 5? sin o plu, 2, w), 
or, since “2+ 92+ w? = 52, 


34 
a ; = 
p*(s5, $, 0) = @) s?sin oe. (75) 
This equation is exactly equivalent to (73), but it is ex- 
pressed in terms of the polar representation of velocities 
instead of the Cartesian representation. 


_ This equation reveals a treacherous point in our argument. By 
assumption (4), § 64, all directions should be equally likely; and we 
have already remarked that because of that assumption we were 


§ 66. MAXWELL’S EQUATION 167 


justified in assuming p*(u, v, w) to be a function of 52 only. Yet 
(75) contains ¢ as well as s?, and on first thought would seem to 
violate one of the assumptions upon which it is based. 

The explanation is a simple one. By the brief expression “ all 
directions are equally likely” we really meant this sensible thing: 
that if a direction were named, the chance of our particle deviating 
from that direction by an angle less than da, say, would not depend 
upon .the chosen direction. Or, in still other words, if at some 
instant lines were drawn through the particle in any two directions, 
and if like cones were instantaneously described about both, the 
chance of the direction of motion of the particle being within one 
cone would equal the chance of it being in the other. 

Now, if a sphere were described about the common apex of these 
two cones, each cone would cut out an area on the sphere, and 
obviously these two areas would be equal. Hence assumption (4) 
requires the probability of the direction of motion intersecting this 
sphere within a certain area to be the same no matter where on the 
surface of the sphere the area may occur. The element d¢d6, how- 
ever, is not of the same area at all places: indeed the element of 
area is just r? sin ¢ d@ dd, if r is the radius; and if we define “ element 
of direction ” as the solid angle subtended by this area, the “ element 
of direction” is just sin ¢ df dé. Hence the form of (75), which 
may be written 


38 
p*(s, >, 9) ds db dd = (2) 5s? e~™ ds (sin o dd dé), 
Tv 


is really in accordance with assumption (4). 


§ 66. Information that Can be Derived from (73) and (75) 


From (75) we can easily find the probability that the 
speed of a molecule lies between s and s + ds, if we do not 
care what direction is associated with that speed. We need 
only integrate p*(s, ¢, 0) ds do dd over all possible values of 
6 and ¢—that is, we need only sum up, for every possible 
direction, the probability that the desired speed s occurs im 
combination with that direction. However, to include every 
possible direction we must allow @ to vary from 0 to 2a, and 
¢ from o to 7. Hence 


34 Qar 
p*(s) ds = (2) (ae as{ de d6 sin ¢, 
TT 0 0 


168 PROBABILITY AND ITS ENGINEERING USES 


Integration gives at once 
cae 
DG aN a (76) 


The essential thing to remember about these formule is, 
that (73) and (75) give the probability of a molecule haying a 
specified velocity (direction included), while (76) gives the 
probability of it having a definite speed. 

Returning again to (73) we ask for the probability that a 
molecule has u as its component of velocity in the x-direction, 
regardless of the transverse velocity components with which 
u may be associated. We obtain this answer from (73) by 
exactly the same process as was used in obtaining (76) from 
(75). That is, we sum up p*(u,v, w) du dv dw for every 
value of v and w which can possibly be associated with the 
in question, thereby obtaining 


p*(u) =f dof dw p*(u, 0, w) 


a\% co 
—au2 —av2 — 

= (‘) e A e “dof e dw. 
7 —0 —0 


Each of the integrals involved in this expression is in exactly 
the same form. If we evaluate one of them, therefore, we 
shall automatically have the value of the other also. We 
choose to deal with the one in w. Since e7™™ is an even 
function, its integral between the limits —° and o must 
be equal to its integral from 0 to +0. We may therefore 
compute the integral over the latter range and double its value. 
However, by means of the substitution y = aw? we obtain 


2) ¢ dw = +f “4 eV dy: 
i 7 Vado Y d y3 


and this integral in y is by definition — 3!, the value of which 
is known to be Wx. Hence each of the integrals which we 


§ 66. MAXWELL’S EQUATION 169 


we have to evaluate is Vx/a, and their product is r/a. This 
leads to 


pr) = tem" on) 


Since (73) is symmetrical in w, v and w we may write at 
once the distribution functions for the v and w components 
of velocity, 


pr) = Vee 


D* ta) = ae eine 


Were we not interested in this place in illustrating the formal 
operations by means of which distribution functions can be manipu- 
lated from one form to another, we could have obtained (77) much 
more easily from (72). It is at once obvious that log p*(u) must 
be the sum of a; u? and some constant. Moreover log p*(v) and 
log p*(w) must also contain this same constant, which therefore is 


ao0/3- This leads at once to (77). 


Another type of question which arises frequently in physical 
problems concerns the sort of molecules which pass through a 
given surface. For instance, we may be thinking of diffusion 
of a gas through a hole in the wall of the containing vessel, 
or we may be thinking of the molecules which pass across an 
imaginary mathematical surface somewhere inside the vessel. 
The two cases differ principally in the fact that in the former 
the molecules cross the surface in one direction only, whereas 
in the latter they cross in both directions. We choose to 
discuss the latter case. 

To be absolutely specific in our thinking, we phrase the 
question this way: If a particular molecule has crossed a 
particular element of surface during a particular short interval 
of time dt, what is the chance that it had the velocity compo- 
nents u,v, w? 

This is obviously a case for the application of Bayes’ 


170 PROBABILITY AND ITS ENGINEERING USES 


, 


Theorem; for if we call “having the velocity in question ’ 
event 4, and “crossing the surface”’ event B, the question 
really is, “B having happened, what is the chance that it 
was accompanied by 4?” The answer, we know, is 


De* (uy v,W) = p*4, 8 (B) 


Of the quantities on the right, p*(u, v, w) is given by (73), 
and of course the denominator can be found by summing 
expressions such as the one in the numerator, provided Py»(B) 
can be found. Virtually the entire problem, therefore, lies 
in finding this last quantity. 

Let us make our choice of axes in such a way that the 
x-axis is normal to the particular element of area which we 
have under consideration, and which we will denote by dZ. 
Then whether the molecule in question crosses or does not 
cross the surface depends upon its position at the beginning 
of the time interval dt. The simplest method of explaining 
where it must have been in order to get across is to imagine 
that it itself is somehow brought to rest at the beginning of 
the interval, and simultaneously the element of area is given 
a velocity —u, —v, —w. The relative velocities remain 
unaltered, and the molecule would have passed through the 
area if, and only if, the area passes over the molecule. By this 
process, however, the area is made to generate a solid, the 
volume of which is its base d4 multiplied by its perpendicular 
height. That height is evidently equal to the arithmetical 
value of u dt, which we denote as usual |u| dt. Hence the 
volume element is | u | dd dt. 

Our molecule, however, was tagged without reference 
to its position: it is just as likely to be in one part of the 
vessel as another. If, then, we denote the total volume 
by V, we have Puw(B) =|u|dddt/V. This is our numer- 
ator. 

To get p(B) we must sum this compound probability over 
every possible event 4 with which it may be associated: that is, 


§ 67. A COMPLICATED JACOBIAN - wr 


over every possible set of values of u,v and w. This gives us, 
when common factors-are cancelled, 


| Uu | en Aue+ 2+ w?) 


ro) ce) ce) : 
{ (F { Feet tet dy dodw 
a7 = Diet 1 ae? Oo 


We have already found that the »v and w integrals together 
give a factor r/a. The uw integral is complicated somewhat 
by the presence of the absolute value of u in its integrand. 
Upon noting, however, that | u | equals + wu when z is positive 
and — u when zu is negative, we easily break the integral up 
into two parts each of which can readily be evaluated. The 
answer is 1/a; whence the entire denominator becomes 7/a?. 


Thus finally, 


a? —a(u2+ 02+ w2 
ps*(u, 0, wv) = —|ule perce (78) 


Pa*(u, Vy w) SS 


This is the answer required. 
We shall not follow the problems of the kinetic theory 
further at this time.! 


§67. Some Instructive Illustrations; A More Complicated 
Jacobian 

As a final illustration of the use of the Jacobian in trans- 
forming a distribution function from one set of variables to 
another, we choose the following, which will be of service to 
us later on. 

ExamMPLe 41.—Six new variables, u, 0, w, u', v', w', are defined in 
terms of six old variables u,v, w, u', v', w', by means of the equations 


u“u=u—dS, uv’ =u’ —dS, 
.= o— ps, y= Os (79) 
w= w— vs, w = w' — vS, 


where S is written briefly for the expression 
S=Mu—w)+uo—1') + r-(w—- wv’), 


1 They will be considered in greater detail in Chapter XI. 


172, PROBABILITY AND ITS ENGINEERING USES 


while », » and v are constants. What is the ratio of corresponding 
volume elements dA and aA in the two manifolds 2 * 


By straightforward differentiation we find 


Ot -y2 a1 -m, 
Ou os 
erie as ec 
- —— = =e Aye 
and so on, the complete Jacobian being 
dA 
Piglet eee =e Or hu dy 
= Nia ee eee, Au pe py 
— dv — py I — pv? dv py y? 
= 42,6 he ore er A hy 
ae aay = py du Tine uv 
— dv — py — y vy pv I+ y? 


This determinant can be greatly simplified by a few ele- 
mentary transformations. Thus adding the fourth column 
to the first, the fifth to the second and the last to the third, 
gives 


dA is mi 
fe) I fe) Au pt py 
@) (e) I dv bv p2 
I oO 1°) Tb Au dv 
= Z ° Au ege diin yy 
fe) fe) aia dv py wo y- 


+“ Manifold ” is a technical term for what is often called “ space of n dimensions.” 
Each of our sets of coordinates would require a six-dimensional space, or manifold, 
to represent it, 


§ 68. GENERAL SIGNIFICANCE OF THE JACOBIAN 173 


Next, subtracting the first row from the fourth, the second 
from the fifth, and the-third from the last gives 


ga I fe) fe) x? mn x 
dA 
fe) I fe) Au yi py 
fe) fe) I dv pv y2 
@) @) @) “ (e) 1) 
@) O oO Oo I fe) 
fe) fe) fe) fe) fe) I 


As all the elements below the diagonal are now zero, the 
value of the determinant is simply the product of the terms 
of the principal diagonal. Hence 


In the case of this transformation, therefore, the elements 
are equal in both systems. 


§ 68. The General Significance of the Jacobian 


It is not only in probability theory that the Jacobian is of interest. 
What we have really said in arriving at it is this: We have a function 
p(x, ¥,%,.-+, Ww), which, when multiplied by dd = dx dy dz ...adw 
and integrated over a certain part of the manifold in which those 
variables are represented, gives a number in which we are interested. 
We also know a set of relations by means of which x, y,2,..., 
can be represented in terms of new variables £, 1s Sy 0009 W It is 
therefore a simple matter to substitute these relations in p and get a 
formula which will give us the value which this function takes at 
every point of the Greek manifold. However, if we multiply this 
cuantity by da = di dndé... dw, and integrate over the proper 
part of ¢his manifold, we will not get the same number as before, 
because da and the d/ to which it corresponds are not equal. To 
get back to the same number, we must multiply p at every point by 
the ratio d4/da at that point, before we integrate. te 

For example, in computing the area of a circle, which is really 


174 PROBABILITY AND ITS ENGINEERING USES 


integrating the function p(x, y) = 1, we might desire to use polar 
coordinates r and @ defined by the equations 
; x =r cos 6, 
(80) 


y=rsin 0. 


Of course, since p did not contain x and y, it does not contain r and 6 
either. But it is not true that the area can be found by integrating 
dr d@ over the circle in question. 

In ordinary calculus we say the reason for this is that dr d@ is not 
the element of area. Instead the element of area is rdrd6. It is 
obvious that this r is nothing else than the Jacobian of the trans- 
formation. 

In similar fashion, the “element of integration’ 
coordinates &, 7, {,..., w 1s always 


D5 


in the set of 


OWES, Sy ax ss) ae 


OES Hs, Goce) 


The only difference between the Calculus point of view, which is the 
more useful one in most lines of investigation, and that which we have 
used in our discussion of probability, is that in the latter case it 
proved to be most satisfactory to include the Jacobian as part of 
the new distribution function, whereas it is generally simpler to think 
of it as a part of the differential element instead. 


PROBLEMS 
1. Find the Jacobian of the transformation (80). 
2. Find the volume element in cylindrical coordinates. 


3. Planck’s radiation formula 
av? 
Ce 


piv) = (81) 
can be interpreted as a “ distribution function ” in which the variable 
is the frequency of the light. That is, the probability of unit energy 
being emitted between frequencies » and » + dy is given by (81). 
Find the probability of unit energy being emitted between the 
wave-lengths \ and \ + dd. 


4. What is the probability that the energy of a gas molecule lies 
between W and W + dW? 


5. When a gas diffuses outward through an orifice the equilibrium 
conditions within the enclosure are destroyed. It is therefore no 


MECC a 


"ae 


Pe ee a ee a 


~ 


§ 68. PROBLEMS 15 


longer true that the probability of a molecule having specified velocity 
components is independent.of its position. This renders invalid the 
argument by means of which we determined Puvw(B) in § 66. 

Sometimes in physical problems, however, we think of the elec- 
trons within a metal as behaving just like the molecules of a gas. 
If the metal is hot, “thermionic” electrons leak off its surface. 
They are supposed to get out in such small numbers as not seriously 
to upset the equilibrium within the metal. Assume this to be true; 
also assume one is emitted if and only if its x-component of velocity 
exceeds a positive quantity VE. 

If the history of an emitted electron is traced back to a moment 
just before emission, what is the probability that its velocity com- 
ponents were u, 0, w? 


6. In the case of the thermions mentioned in Problem ¢ it is 
supposed that those which emerge have their v and w unchanged, 
but that their velocity in the x-direction is changed to a new value 
u’ defined by the law u? — uv’? = E. 

Assuming this to be true, what is the distribution of velocities 
after emission? 


7. The can containing a gas carries a set of axes x, y, z. Tt is 
being translated relatively to a “ fixed” set of axes x’, y’, 2’ with a 
velocity U, V, W. By the principle of relativity this is the same 
thing mechanically as if the x, y, z were “ fixed” and the axes x’, y’, 2’ 
were moving with a velocity — U, — V, — W with respect to them. 
Of course, if the argument of § 64 is true at all, it applies to the can 
and the axes x, y, z. Find the chance that a particular gas molecule 
has components of velocity x’, v’, w’ with respect to the axes x’, y’, 2’. 


8. Find the volume element in toroidal! coordinates; that is, in 
a system in which any point P which lies in the «z-plane has the 
coordinates r, 0, 0, while if it does not lie in that plane, its first two 
coordinates are determined in exactly the same manner in the plane 
which contains both the point and the z-axis, while the third coordi- 
nate is the angle between this plane and the «z-plane. The system is 
illustrated by Fig. 21. 


g. In aiming at a target, the bull’s-eye will not always be hit. 
We choose to think of firing over so short a range that there is no 


1 The name “ toroidal ” is due to the fact that any surface defined by the equation 
ry = constant is a “ torus” (that is, a doughnut). The other coordinate surfaces are: 
for @ constant, segments of cones having the 2-axis as axes; for ¢ constant, planes 
containing the z-axis. 


176 PROBABILITY AND ITS ENGINEERING USES 


appreciable curvature of path and make the following assumptions 
(see Fig. 22): 
(a) The chance of a horizontal error 4 in aim is totally 
independent of any error » vertically. 
(2) The chance of lying in an angular sector d@ is the same 
for any such sector. 


Find the chance of a shot having an error between (A, v) and 


(h + dh, v + do). 


10. In the case of Problem 9, find the chance of the shot hitting 
between (7, 0) and (r + dr, 6+ d6). 


Fic. 21.—Toroi1DAL CoorDINATES. Fic. 22. 


11. Suppose the target of Problem g to be inclined with respect 
to the vertical at an angle a. Find the probability of a shot falling 
between (/’, v’) and (A’ + dh’, vo’ + dv’) on this target, the #’ and v’ 
being supposed to be measured along its surface. 


12. In Problem 9, if we choose any direction 6, there is a distance 
r(0) at which the probability of lying within a given element of area 
dA is exactly p*. For some other direction the same will be true, 
though the new distance need not necessarily be the same. The curve 


r= r(6) 


along all points of which the probability has the same value p* is 
called a “curve of equal probability.”” Find the equation of these 
curves. ; 


13. Find the curves of equal probability in Problem 11. 


REFERENCES FOR OuTsIDE READING 


1. CooripcE: Probability, Chapters I and V. 
2. CzuBser: Wahrscheinlichkeitsrechnung, pp. 273-286; 389-400. 


CHAPTER VII 
AVERAGES 


§ 69. Definition of an Average 


The average of a set of quantities is such a quantity that 
if every member of the set were replaced by it, their aggregate 
would remain unchanged. 


For example, the average weight of a group of men is such 
that if every man were of the same weight, their aggregate 
weight would be unaltered. If there were three of them, 
weighing 140, 160 and 195 pounds, the average would be 165. 

Obviously, if the set contains m quantities, of which ™ 
have the magnitude a:, m2 the magnitude ae, and so on, this 
definition is equivalent to the equation 


m a = am; 


which leads at once to the formula ! 


m(a@) =4 = =p) Gy ny (82) 
m 
for computing an average. 


The following theorem is almost obvious: Jf a is the average 
of a set of numbers a1, 2,..+ 4n, and b is the average of 
bi, b2, ~~ On, the average of the sums a, +h, a2 + ba,..., 


Gn to, is a+b. - 


§ 70. Mathematical Expectation 


Suppose there is a variable @ which is capable of taking 
on any one of the values a1, a2, a3, ... , 4s, and suppose that 


, 


14(a) means “ the first moment of 2”. 


in § 71. 


The reason for this notation will appear 


177 


178 - PROBABILITY AND ITS ENGINEERING USES 


the chance of it taking the value a: is pi, the chance of it 
taking the value a2 is pz, and so on. Suppose finally that a 
total of m independent trials of this quantity is made, and 
that in 7, of them its value is a, in 72 its value is a2, and so on. 
Obviously the aggregate value of the quantity @ in all the 
trials is aim: + aono +... + a.m, and its average value per 
trial is 


is Ny 12 71 
A = G7, —-"=b go— ae ee 
m m ™m 


We cannot predict in advance what this average @ is 
going to be: that~is, we cannot predict it with certainty. 
The only certain way is to try it. But we know from Ber- 
noulli’s Theorem that the cHANCE of m/m differing from pi 
by an important amount becomes smaller and smaller the 
larger m is made, and a similar argument applies to the other 
ratios also. Hence, in a large number of trials there is little 
chance that @ will differ much from aipi + aep2 +... + 4sDs. 
We call this quantity the “ mathematical expectation ”’ of a. 


Definition: If a can take only the values a1, a2, ..., 4s, and 
zero, the probability of each being p(ai), p(a2), ..+ 5 plas) and 
P(O), the mathematical expectation of a is» 


a(a) =D 4 pla). (83) 


The following theorem is at once obvious: 


Theorem: If a large number of independent trials of the value 
of a are made, the chance that its average value per trial differs 
from its mathematical expectation by more than some preassigned 
quantity is small, and may be made as small as we please by 
sufficiently increasing the number of trials. 


Strictly speaking, this theorem is only obvious when the number 
of possible values of a is finite. For when there is an infinity of 
terms, 1t is not necessarily true that the sum of their limits is equal 


*Read “the expectation of 2”. The reason for the subscript 1 will appear in 
§ 71. When no confusion can arise, (2) and y1(a) will often be written simply « 
and Hie : 


§ 70. MATHEMATICAL EXPECTATION 179 


to the limit approached by their sum. In other words, it is not 
always proper to take limits of individual terms and add them 
together unless the number of terms in the summation is finite. 
This is illustrated, for instance, by the set of terms 


Siw. Sin 3K Sif 6x 
5) ) 5 dloncag 
: 3 5 

each of which approaches the limit zero as * approaches 7. The 
sum of the limits approached by the separate terms is therefore 
o+to-+...=0. But the sum of the terms themselves is a 
Fourier series which represents the constant + 7/2 for values of 
x between o and z, and the constant — 2/2 for values of « between 
m and 27. ‘Therefore the sum of an infinite set of such terms ap- 
proaches either + 2/2 or — w/2 as a limit, according to whether the 
limit is approached by starting from values of * smaller than 7 and 
ascending toward zw or by starting from values of » larger than 3 
and descending toward it. In no case, however, can the limit of 
the sum be made to approach the value o. 

So in the case of our theorem. There are problems in which 
a very large difference between experimental average and expectation 
is almost certain, no matter how large the number of trials. One is 
given in §78. I know of no case, however, in which such a problem 
has any practical importance. 

Another remark should be. made in passing. In the sense in 
which we are using the words, an “ average” is the result of experi- 
ment, while a “‘ mathematical expectation ” is an advance judgment 
as to what we had a right to expect that average to be. This usage 
is not universal. Many writers use the two terms more or less indis- 
criminately; others use “‘average” for either idea, and “ expecta- 
tion” only when a valuable consideration is involved. 

We shall attempt to keep the ideas distinct; though we shall 
usually say “expectation” in preference to the more cumbersome 
“‘ mathematical expectation.” } 


A simple example will make the general idea of expectation 
somewhat clearer: 


EXAMPLE 42.—Two dice are tossed. If seven appears the player 
receives a dollar; otherwise nothing. What is his expectation of gain? 
OSS EET OE ERODE 5 de ae GRE SEE 


” 


1There is another rather mystical concept of “moral expectation” to which 
we shall refer in passing in §79. It was introduced years ago in a futile attempt to 
explain away something which can better be explained without it; and though it is 
no longer believed in by anyone, it is still accorded pentasyllabic lip-worship in our 
adherence to the adjective ‘‘ mathematical.” 


180 PROBABILITY AND ITS ENGINEERING USES 

Here the “ quantity 2” represents money won, and can 
take only two values, one and zero. The probability of the 
first is the same as the probability of a seven appearing, which 
is 4; the probability of the other is ¢. Hence 


a(@) =G1+¢:0 =@. 


The player’s expectation of gain is therefore one-sixth of a 
dollar. 

Suppose, now, that the player were required to pay a fixed 
sum 4 per throw. _ After a large number of games he would 
have received some average amount @ per game: if 4 >4 
he would be the loser, and if 4 <@ the winner. We already 
know, however, that @ is not likely to deviate appreciably 
from e: if the number of games is great. Hence if 7 > a, 
that is, if he pays more than 163 cents per game, he is almost 
certain to lose in the long run; for no matter by how little 
his payments may exceed his expectation, the probability of 
his average winnings differing from it by as much will approach 
zero as the number of games is increased. Conversely, if his 
payments are less than his expectation, he is almost sure to 
win. 

It is obvious, then, that the game is not fair in either of 
these cases; and we conclude that what he could fairly be 
required to pay per game is a sum equal to his expectation. 

But suppose he did pay 163 cents per game; then what? 
After he has played m games he will have won some number n, 
and his net winnings will have been x —¢m. By the first 
half of Bernoulli’s Theorem we know that if m is large enough 
n will almost certainly differ from 4m by more than any pre- 
assigned quantity, however large. Hence, given games 
enough, the player is almost sure to either win or lose a very 
large sum; but which it will be we cannot say. By the second 
half of Bernoulli’s Theorem we are equally certain that 


n 


a z or in other symbols 4 — a, will be negligible: the 


average net gain per game will very probably be small. 


§ 70. MATHEMATICAL EXPECTATION 181 


Now what is true of this example is true in general: Whenever 
a gambling game is conducted by repeated independent trials, 
a player who pays out more per game than his expectation of 
winning 1s virtually certain in the long run to experience an 
average loss per game substantially equal to the difference; if 
he pays less than his expectation, he is virtually certain in the 
long run to experience an average gain per game substantially 
equal to the difference; while if he pays exactly his expectation his 
average loss or gain per game will almost certainly be negligible, 
though the aggregate will probably be large. 

Let us put this in more vivid terms: If you pay too much 
you lose a lot; if too little the other fellow does; and if just 
the right amount one of you loses, but not such a whale of a lot. 

Insurance viewed from the companies’ standpoint nearly 
duplicates the conditions of this problem: it would exactly 
duplicate them if every risk were “equally good,” ! equal in 
amount,” and paid for in a single premium.? 

A few decades ago it was customary for fraternal life 
companies to pay too much per game (that is, the face value of 
their policies was higher than the premium justified), and they 
actually lost money. Conservative companies, on the other 
hand, pay too little per game (that is, they charge decidedly 
more than the policies are worth, though the mutual com- 
panies return most of it in the form of dividends). They 
are “gambling on a sure thing.” If, instead, they charged 
the price which is mathematically “ fair,” they would be about 
as likely to go bankrupt as not, which would certainly not be a 
benefit to their policy-holders. 


Example 43.—4 penny is tossed repeatedly. If heads appears 
for the first time on the n’th throw, the player receives n cents 


1In the case of life insurance, if every man of a given age were equally likely to 
die at a given time, so far as the company knew. This is not actually true, due to the 
knowledge they acquire from their physical examinations. 


2 That is, if every life were insured for the same amount. 


In practice, the payment of yearly premiums complicates the computation of 
the company’s “expectation of gain.” It does not, however, affect what we are 


about to say. 


182 PROBABILITY AND ITS ENGINEERING USES 


and a new game begins. Thus if tails appears twice, followed by 
heads, the three throws constitute a “ game,” and the “ winnings” are 
three cents. What is the expectation of gain per game? 


If heads appears for the first time on the first throw the 


gain is 1; the probability of this is p(1) = %. For tails 
followed by heads the numbers are 2 and p(2) = 4. In general, 


the probability of a gain j is p(j) = 1/2’. Hence 


2 ese 


7=0 2” 


This sum can easily be shown to be 2.1! To make a fair 
game, therefore, the player would have to pay the “ bank” 
two cents per game. 


§ 71. Derived Averages and Expectations 


From any set of numbers a1, d2,..., 4s, a host of new 
sets can be obtained by various arithmetical processes. In 
patticular, the sets.a17) 427) <1. 4s°5141y 2 ye ie see 
like, can be built up. Obviously each of these derived sets 
has its own average, and these derived averages are just as 
truly descriptive of the original set as a was. They are the 
“average square’”’ of a, its “ average cube,” and so on. We 


1Let y=x+ox?+ gre+.... Then y/e=1+2%-+ 30?+...3; and 
on =ao+it+x+x?+x3+..., the constant of integration being written 


in the form ao +1. But 1 +%*-++?-+...is a geometric series and has the sum 
1/(1 — x). Hence. 


Lee aye 
x ire 
d I I 
X= 2 (oa = > 
yak I—-% (x — x)? 
or 
x” 
y. 


The series in the text is the special case obtained by putting x = 4. It therefore sums 
up to the value 2. 


§72. AVERAGES OF CONTINUOUS VARIABLES 183 


denote them, either by a?, a,..., or by u2(a), ws(a),.... 
Thus in general 


= I 
@ = ua) =—Safny. (84) 


These quantities are called by the English school of statisticians 
the “moments’”’ of the set of numbers a;. The reason for this 
name is as follows: 


Suppose the a’s were represented on a horizontal axis, and that 
a weight m; were suspended from each point a; The dynamical 
moment of these weights about the point zero would be exactly 
pi; and their “moment of inertia”? or second moment would be 
pe. It is a small generalization to speak of an “ith moment” 
as well. 


The sum of the 7th powers of the numbers a1, a2, .. . would 
not be changed if each of these powers were replaced by at. 
Of course, replacing every one of the a’s by Val before raising 
them to the ith power would have the same effect. This is 
called the root mean} ith power. In particular Va? is the 
“ root mean square.” 

To these “‘ average ith powers” or “th moments ”’ corre- 
spond “‘expected ith powers” or “ith expectations.” “Thus, 
by merely using the definition (83) we obtain 


«(a) = La; p(a). (85) 


The factor p(a) need not be altered, for the probability of a’ 
having the particular value a; is the same as the probability 
that a has the value aj. 


§ 72. Average Values of Continuous Variables 


All of the concepts with which the present chapter deals 
have exact counterparts in the case of continuous variables. 
The principal formal difference lies in the substitution of 
integral signs in place of signs of summation. 

To begin with, let us suppose that we are dealing with a 
variable x, the distribution function for which is known to be 


EEE 


1“ Average”? and “ mean” are synonyms. ~ 


184 PROBABILITY AND ITS ENGINEERING USES 


p(x). Suppose the total range of variation of this variable 
to be divided up into elementary intervals of length dx; and 
suppose that a large number of independent trials of » are 
made. Within any interval (, x: + dx) will fall m of these 
values, the sum of which may be denoted by s. Then it 1s 
obvious that 

Nghe < 53 < 2; (6; + dx). 


The sum of a// the values of x is obviously obtained by 
adding together the sums for the separate intervals. Let it 
be denoted by s. Then ‘we have 


Dae, < Ss < Lins wy ax): 


The average value of x, which we denote by x, is obtained by 
dividing s by the total number of trials made. Hence we 
have 


De ey a ees (86) 

m m m 
As m becomes infinite the ratio 7,/m is likely to be very nearly 
equal to the probability of a value falling within the interval 


(x;,%;-+ dx), which is approximately equal to p(x.) dx. 
Further, if dx is not too large, & p(x.) xidx does not much 


differ from { p(s) x dx, and of course & 7;/m = 1. Hence we 
conclude that the lower bound of the inequality (86) is not 
greatly different from [i x p(x) dx, nor the upper bound from 


this quantity increased by dx. The lower and upper bounds 
are therefore almost certainly close together, if dx is small; 
and we conclude that ~ itself is not likely to deviate much 
from 


ae) =f ple) ede. (87) 


This we call the “first expectation of x.” Obviously the 
“Sth expectation ”’ is 


€;(*) = { rte) x dx. (88) 


§72. AVERAGES OF CONTINUOUS VARIABLES 185 


As for the experimental averages themselves, we can say 
nothing beyond what-has already been said: that the chance 
of their differing much from the corresponding expectations 
is extremely small, if the number of trials is sufficiently large. 

Similarly, if we have any function f(x), the expectation of 
this function is given by the rule 


qe if Sx) pla) des (89) 
for by (87) 3 


Ste if fol a, 


which reduces immediately to (89) because of the rule! 
Pf) = pt) /f'(*). 

In general, however many variables the quantity f may 
depend upon, its 7th expectation is always given by the law 


a= fff. ; Lf FPG 5-50) dx dy dz...,dw. (90) 


We do not stop to demonstrate that this, too, is a consequence 
of the definition (87). : 

As an example, we ask for the “ expected energy ” in the 
case of a gas molecule. Energy is W = $ms?; hence by 


(89) and (76) 


¢ » 


a(W) = { Ams? 966) ds 


had Cpe ere 
= 2m V— | ste-™ ds. 
T 


If we set as? = 2, this becomes 


Ae vA f 2 e-* dz 


a 
m m 
- =f =e (91) 
ann \2 4a 

1To correctly interpret this and similar formulae, the reader must remember that 
p(«) dx means always “the probability that x lies between x and « + dx”, and 
p(s) 4 “the probability that f lies between f and f + df.’ Naturally, the form of 
the functions p(x) and p(/) need not be the same. 


186 PROBABILITY AND ITS ENGINEERING USES 


The result (91) has been obtained from a formula for the 
speed. It could equally well have been found from (73) 
directly by the use of formula (go). Thus: 


W= oe spc ge ad 
Hence 


(WV) -(f [Petes ep wou) du dod 


34 
st (falar 
Bb Nap 
ae fffe ge di dvee 
+ {ff- g tere) da di du. 


The integrations run from — © to + © in each case. 
The first of the three integrals within the braces is easily 


34 
found to be equal to ae . The others are obviously equal 


to it because of symmetry. Hence we obtain (91) again for 
the expected energy. 


If we noted the energy of our tagged molecule at a large num- 
ber of instants far enough apart that they constituted “‘ inde- 
pendent ”’ observations, we could obtain an “ average energy,” which 
is nearly certain to be almost equal to «(W). Likewise, if we were 
to observe all the molecules in the gas at once, they would constitute 
a large set of “trials.” If those trials were independent — that ts, 
if the chance of one having a specified W were independent of what 
we may have found to be true of the rest !— their average energy 
would probably not differ greatly from «(W). For this reason 


physicists usually speak of «(W) as the “average energy of the 
molecules.” 


‘They are not independent in fact, for the sum of the energies of all is determined 
by the principle of the conservation of energy. 


’ 


§ 73. THE MEDIAN 187 


§ 73. The Median 


The median of a set of numbers is that number which 
occupies the central position when the sequence is arranged 
in order of magnitude. In other words, there are just as many 
numbers in the sequence /arger than the median as there are 
smaller than the median. 

Strictly speaking, this definition implies that the number 
of terms in the sequence is odd. If the number of terms is 
even, either of the two adjacent numbers, or, better still, their 
average, can be taken as the median. Due to their arrange- 
ment in order of magnitude, it will generally make little dif- 
ference which of these conventions is adopted. 

As an example, the median of the set of numbers 


98, —%, = 8, — 1, — 1, 4-28 +56, 4+.56,.+ 68 


is — I, since there are just four numbers greater and four 
numbers less than (or equal to) —1. The average of the 
sequence, on the other hand, is one-ninth of the sum of the 
terms, which turns out to be 18. 

To the ‘‘ median” corresponds an “expected median.” 
For suppose the chance of @ taking the value a; is p(aj). A 
long series of trials will result in  a:’s, m2 a2’s, and so on. 
Let these be arranged in order, and call the middle one a. 
Then, obviously 


m 
mt+nmet..- +7-1 oe: 


¢ 


m 
Mmtnmet... +27; ie 


But in the long run 7,/m is not likely to differ much from 
p(a;); so if we divide these inequalities by m throughout they 
immediately suggest the relations 


pla) + paz) +... + p(a-1) < 4, 
1 


pa) + p(az) +... + p(a) Saat 


This is our definition of the expected median, 


188 PROBABILITY AND ITS ENGINEERING USES 


Hence: The expected median a; is such a number that there 
is less than an even chance of either a or a; being greater than the 


other. ; 
For example, in tossing two dice the chances of various sums 


are as follows: 


Sum S32 4g 6 eG ea merce 


Probability os Os Be Be 3s Be 35 BS BE BE BE 
The expected foals is therefore 7, for the chance of the sum 
Seas 48S aay, and. the chance of 7 exceeding the sum is 
also 24 

The fara idea is that of a number which we can expect 
to be exceeded as often as not; but its own chance of repeated 
occurrence spoils so simple a definition when the variable is 
capable of only a discrete set of values. On the other hand, 
if the variable is capable of continuous variation this definition 
is strictly true, for in that case the chance of the variable 
taking exactly its median value is zero. By a line of argument 
exactly similar to that used in §72 we can show that the 
expected median of x is that value %m for which 


[ 900) de = 3. 


Of course the ee of p(x) between the limits — oo and 
Xm is also s. 


§ 74. The Deviation 


So far we have introduced three fundamental probability 
concepts: that of the “most probable” result, of the 
“expected” result, and of the “expected median”; and 
have intimated that a host of derived, or secondary, concepts 
are possible, of .which “expected ith powers” have been 
specifically mentioned. These derived concepts are seldom 
used in practical studies, however, except in connection with 
the “ deviations” of a set of numbers from the average of the 
set. 


§ 74. THE DEVIATION 189 


Thus, to the set of numbers a, a2,..., as repeated 
1,2)... , M, times, respectively, there corresponds an average 
@ and a set of deviations from that average 


a, = a, — 4, 


dz = 42 — 4, (92) 


d, =a, —4, 


each of which recurs as often as the corresponding a. Obvi- 
ously the set of @’s has its own average, median, and the like. 
Of these two are especially important: 


The average of a set of d’s is zero. 
By definition d, or u1(d), is 
I 
mi(d) = a Lid; ny. 


But from (92) 5 
2 dy 1 = 2a Gy 13 — may 


e. and by (82) Lia,n; = ma. Hence 


Hei Ge 


The mean square deviation of a set of numbers is equal to 
the mean square of the set diminished by the square of their 
average. 


For 


po(d) = -¥ ny (ay — a)? 


ey Ny G3 = ae} Ny G4 + iB. 

m m 
But by the use of (82) and (84) this reduces immediately to! 
wd) = P-w (94) 


"1 This equation could obviously be written in either of the forms 


: ra or a or pa(d) = ps(a) — [us(a)]?. 


Similarly (93) could be written d = 0. 


190 PROBABILITY AND ITS ENGINEERING USES 


The square root of this quantity 2, which we would 
naturally call the ‘‘ root mean square deviation,” is generally 
referred to in statistical works as the “ standard deviation.” 

Finally, if a set of numbers a1, d2,..., 4 have an expecta- 
tion e(a), the quantities 


61 0 | oe Cll 


d2 = 42 — 1, (95) 


6s = Gs, — €1 


are their deviations from expectation. The probabilities of 
these deviations being the same as the probabilities of the 
original a’s, it follows that 


(5) = Xi p(a;) (a) — «1) 
= i ay (a) — «4 x D(a) 
= €— 4 » p(aj). 


As the summation covers every possible value of a;, including 
zero, the set of a’s is complete, and X p(a), = 1. Hence 


. e1(6) = 0. -(96) 
This corresponds to (93). 
Similarly 

e2(5) = 4 p(a;)(a; — «1)? 
DX a; Play) — 26 D ay p(@) +a? dX p (ay) 
ex(a) — [a (a)]?. (97) 
This corresponds to (94). 

It is obvious that all the formule of this section apply to 


continuous variables as well as to those which are capable of 
taking only a discrete set of values. 


ll 


§75. Résumé 


Most of the concepts which we have introduced in this 
chapter deal with quantities which may take various magni- 


Yr. eS eT Las 4 Se 


A 


§ 75. RESUME 191 


tudes. This is, in a way, a limitation of the general idea of 
an “event” and its probability, for not every aspect of an 
“event” is numerically measurable. We now frankly limit 
outselves to such things as are measurable, however, and speak 
for the future only of the probability of variables taking 


ae eee 


various values. 
following form: 


Concepts Associated with the Results 
of Experiment 


(1) [Of a finite set of numbers, 
the one which occurs most fre- 
quently is called the “‘ mode ”}.! 


(2) If the numbers are arranged in 
order of magnitude, the one which 
occupies the middle position is the 
“ median.” 


(3) That number by which every 
number of the set could be replaced 
without changing their sum is the 


“average” or “mean” of the set. 


(4) If each number is reduced by 
an amount equal to the average of the 
set, the resultant numbers are the 
“ deviations from the average.” 


(5) The “mean deviation” is 
zero. 


(6) If each deviation of the set 
is raised to the ith power, the aver- 
age of the result is called “the th 
moment ” of the deviations. 


What we have so far learned then takes the 


Concepts Associated with Advance 
Judgment 


(1) Ofa finite set of numbers, one 
is the “‘ most probable.” (It is also 
called the “ mode ” by some writers.) 


(2) There is a value which the 
variable has at most an even chance 
of exceeding, and at least an even 
chance of either equalling or exceed- 
ing. It is the “ expected median.” 


(3) There is an average per trial 
which a number is most likely to 
show after m independent trials have 
been made. This average ap- 
proaches a limiting value as m is 
indefinitely increased. This limiting 
value is the “expectation” of the 
number. 


(4) If every possible value is 
reduced by an amount equal to the 
expectation of the set the results are 
the “ deviations from expectation.” 


(5) The “‘ expected deviation ”’ is 
zero. 
\ 
(6) If every possible deviation 
from expectation is raised to the ith 
power, the resultant set of numbers 


" possesses its own expectation. This 


«ce 


is called the “‘ ith expectation” of 


the deviations. 


1The statement in brackets is not made in the text, 


192 PROBABILITY AND ITS ENGINEERING USES 
§76. Some Instructive Illustrations; The General Case of 
Independent Trials 


Let us find the expected number of successes in m indepen- 
dent trials of an event, when the probability of its occurrence 
on a single trial is p. 

The formula for the probability of 2 successes 1s 


pin) = Cy p(t — py. 
As the quantity in which we are interested is the number of 


occurrences of the event, a, = 7. The expectation of 7 then 
becomes 


a = 37 p(n) 


=n p(1 — py", 
n=0 


But 
NOG =n Cons 
and 
m—n =(m —1).— (mn —1). 
Hence 


e(z) = mp 3 Corp (bi )) gee 
n=0 


This summation, however, is of exactly the form } (15), except 
that m is replaced by m —1 and » by m —1. Hence the 
sum of all the terms is [p + (1 — p)]”"* =1. It therefore 
follows that e(z) = mp. In this case the expected number of 
successes and the most probable number come out to be equal, 
except that the expected number may be fractional, whereas 
the most probable number is necessarily an integer (see § 38). 


* There “ one term too many, but this term turns out to be zero because of the 
factor C7 °, which vanishes. 


§ 77. DEPENDENT TRIALS 193 


§77. Some Instructive Illustrations; The General Case of 
Dependent Trials of the Sort Discussed in § 27 


The discussion in § 27 concerned itself with the probability 
of drawing just p red and g black balls from an urn which 
contained m red and x black balls, assuming that after a ball 
had been withdrawn it was not replaced before the next draw- 
ing wasmade. The general result was given by equation (25): 

d ages (D5 q) a sae (25) 
p+a@ 


It is desired to find the number of red balls which may be 
expected to be drawnin p+ q =F trials. 

In this case the quantity under discussion — the number 
of red balls—is obviously measured by p itself. In other 
words,a, = p. Moreover, since p + q is equal to the constant 
r, q 1s also a function of p and must be replaced by its value 
r—p. Thus we find 

Se bok Oe 
e1(p) ae 2 cnn S 


(0) 


Noting that C7** does not vary from term to term of the 

summation; and replacing pC} by mC7-{, this becomes 

Mm « : 

a(p) = Gmtn 2p COR Gs 

i p=0 
The sum of binomial coefficients contained in this equation is 
in the form (26), as can easily be seen by writing C¢_1-@-1 
im place’of C?_,. Hence 


: mr 
which works out to be ae 


m + 
As in the last example, when this result is integral, it is 


also the most probable number of red balls, and when it is 
fractional the most probable number of red balls is one of the 


adjacent integers. 


194 PROBABILITY AND ITS ENGINEERING USES 


§ 78. Some Instructive Illustrations; A Dice Problem 


In the illustrations considered in the last two sections the 
‘expectation ”’ was either equal to or adjacent to the “ most 
probable” result, according to whether both results were 
integral or not. The following example serves to show that 
this condition does not always exist. 


¢ 


Example 44.—A die is tossed until an ace appears. What ts 
the most probable number of throws, and what is the expected number 
of throws ? 

The chance that the ace appears on the first throw is 
pi =%. The chance that it does not appear on the first throw 
but appears on the second is pz = @-g The chance that it 
does not appear on either of these throws but appears on the 
third is p3 = (3)"-4, and in general, the chance that it appears 
for the first time on the jth throw is p; = (@)’ '-%. It is 
obvious that the largest of these probabilities is p:. Hence 
the most probable number of throws is 1. 

The expected number of throws, on the other hand, is 


> pi, OF 
j=l 

4G) = sere) 4) ee 
But from the footnote at the end of § 70 


peed ets 

(ie 2 

Comparing this with «(7) we easily get the result — 
a(j) = 6. 


In other words, the expected number of throws is six, though 
the most probable number is one. 


eo Pia Sey 


§ 79. Some Instructive Illustrations; The St. Petersburg Paradox 


As a final illustration of this sort of computation we take 
the following very famous problem, the apparently absurd 
solution of which has puzzled men for generations: 


Example 45.—A penny is tossed until heads appears. If heads 
appears on the first throw the bank pays the player one dollar. If heads 


§ 79. THE ST. PETERSBURG PARADOX 195 


appears for the first time on the second throw the player receives two 
dollars. If heads appears for the first time on the third throw he receives 
four dollars. If it appears for the first time on the fourth throw he 
receives eight dollars, and soon. What should the player pay the bank 
for the privilege of playing a sequence of this sort in order that the game 
may be equitable 2 

The probability of heads appearing for the first time on the 
nth throw is p, = (3)". If so the player receives an amount 


n—-1 


am = 2". Substituting these values in (83) it becomes 
a(@) = Upagr=Z+etZt.... 

This sum is to be extended to every possible value of 7. 
However, there is no logical limit to the number of tails which 
may appear before the first head shows up. It is possible for 
heads to appear first on the millionth throw, although, of 
course, the probability of any such thing occurring is extraor- 
dinarily small. This means, of course, that an infinity of 
3s must be added together, with the consequence that 
e(@) = ©. In other words, in order to play a sequence of 
this sort fairly to the bank, the player must first deposit with 
the bank an infinite amount of money. 

From a common sense standpoint this result is absurd. No 
sane man would ever consider paying the bank one hundred 
dollars for such a chance, much less an infinite amount. And 
yet the mathematics itself is straightforward. It is certainly 
no more questionable than the mathematical processes used 
throughout the remainder of this book. If, therefore, the 
result is incorrect, it throws suspicion upon the entire structure 
of Probability Theory. It is therefore essential to know why 
the result does not agree with common sense. 

A number of answers have been given to this question. 
Probably the first historically is that given by Daniel Bernoulli, 
who distinguished between what he termed “ mathematical” 
and “ moral” expectation. The former he defined exactly as 
above, but differentiated the latter from it by including 
certain psychological considerations of the following sort: 

A dollar is worth more to a beggar than it is to a millionaire. 
In fact, says Bernoulli, the satisfaction which one gets out 


196 PROBABILITY AND ITS ENGINEERING USES 


of any acquired sum of money is less and less, the greater the 
amount of money which he already has. When, therefore, in 
the game under consideration, the player pays out from his 
moderate fortune a certain sum of money, he pays out some- 
thing the moral value of which is comparatively large. He 
has a chance — though a very small chance — of winning an 
enormous amount of money, provided heads is long delayed. 
Suppose he is fortunate and does win this enormous amount 
of money. He is then a very wealthy man, and his winnings 
acquire a moral value based upon his new, rather than upon 
his old, economic standing. In other words, his losses, being 
based upon a comparatively low economic standing, loom 
larger in his estimation than do his winnings. 

By introducing the law that the relationship between the 
moral and mathematical expectations is of a logarithmic 
nature, the computation of the amount which he should pay 
can be carried out and serves to give a result which is some- 
where within the bounds of reason. It possesses its own 
common-sense absurdity, however, in the fact that the amount 
which the player should pay to the bank depends upon the 
player’s own economic standing as well as upon what he can 
expect to win, and is therefore different for different men, 
even though they play against the same bank. What the 
banker would say to such an arrangement is quite obvious. 
As this is certainly not the true way out of the dilemma it is 
not necessary to go into the mathematics of it. 

Another explanation which has been favored by many 
statisticlans is based upon the fact that, if the problem is 
slightly modified, it gives a result which is not quite so absurd. 
To see this, suppose that instead of using the set of values 
@1 = 1, dg = 2, d3 = 4,..~.*5 Gn = 2” *, the set were taken 
aSui G1 =!1;) a2 =a; as = x22. 5 Saw = eee Wherem AED: 
Substituting these values (along with the values of p;, which 
remain unchanged) in (83) it takes the form 


Low 1 /x\? Se | 
+2241 (8) +...5 


§ 79. THE ST. PETERSBURG PARADOX 197 


This expression is finite. For instance, if x is 1.5 instead 
of 2, the player should’pay the bank two dollars for the priv- 
ilege of playing a sequence, which seems reasonable. How- 
ever, as ~ approaches the value 2 the sum gets larger and 
larger, and from the mathematical standpoint ultimately 
becomes infinite. 

The second explanation of the paradox makes use of these 
facts, and then concludes that we have no intuitive sense of 
the immensity of the difference between the expectation when 
x” <2, and the expectation when « = 2, and therefore are 
unwilling to pay the amount of money which is logically called 
for. This explanation, too, is open to objection, however; 
for if x were taken as 1.99 instead of 2 (so that instead of 
paying $2.00 if heads appears on the second throw the bank 
would pay only $1.99) the amount which the player should 
pay the bank works out to be $100, and no sensible man could 
be induced to pay this amount for playing the modified game. 
Hence, if the trouble is with our intuition, that intuition must 
go wrong even when dealing with numbers which deviate 
from $2.00 by amounts which the department stores have 
done their utmost to make familiar to us. 

I believe the true explanation of the paradox is quite 
different from either of these, and is based upon the fact 
that in our every-day experience we have to deal only with 
individuals who have finite fortunes and who would therefore 
be incapable of paying back the sums which are required in 
those very rare cases where heads appears only after an 
extremely long run of tails. To see what the effect upon 
the mathematical expectation is, if the bank has limited 
wealth, consider the following alternative form of the 
problem: 


ExampLe 46.—What is the equitable payment for playing in the 


game described in Example 45, if the bank’s wealth is limited to 


$7,000,000 ? 


The probability of heads appearing first on the nth throw 
is (4)", as before. If so, the bank pays $2"”* if this is less 


198 PROBABILITY AND ITS ENGINEERING USES 


than $1,000,000, otherwise it pays $1,000,000. In other 
words, 
I 
Pn =n 


es if 2""4 21,000,000; 


n-1 
Go 


Por oas 


if 2"—* 51,000,000. 


Gn. = 1,000,000, 


Now 219 < 1,000,000 and 229 > 1,000,000. Hence the 
first set of conditions applies-for 7 < 20; the second set for 
n> 20. Thus (83) becomes: 


20 co 
a(a) = a (4) oh + a (4)” 1,000,000. 
n= n= 


The first summation obviously gives 10. The second is a 
geometric series, of which the sum is 


1,000,000 


520 = 0.9536. 


In order for play to be equitable against a million-dollar bank, 
therefore, the player should pay $10.95 — a reasonable amount. 
If the bank had a billion dollars, the payment would be 
less than $16; while if the bank had $1,000,000,000,000 — 
probably more than the total economic value of the earth — 
the payment would be less than $21. | 
Taking the other extreme, if the bank had only $100 cap- 
tal, the payment would be $4.26. If it had $10, it would be 
$2.63, while if it had only $1 (in which case the payment 
would be $1, no matter how many tails appeared before the 
first head), the payment ‘would be $1, as it should. 


1 All these numbers are based upon the bank’s wealth after receiving the payment, 
not before. In the case of the million-dollar bank the difference is negligible; in some 
of the figures which follow, however, it is not. 


§80. THE EXPECTATION OF A PROBABILITY 199 


I believe this to be the true explanation of the paradox. 
If the bank were infinitely wealthy, the expectation would be 
infinite. Therefore the mathematics is correct in either case. 
But we are accustomed to deal only with “limited wealth ” 
and cannot conceive of “unlimited amounts of money.” 

In other words, our intuition is in error, rather than the 
theory; but only because the theory deals with material for 
which we have had no opportunity to build up intuitive 
judgments. 


§ 80. The Expectation of a Probability 


The concept of an expectation can be applied to any number 
which may be determined from the result of an experiment. Among 
the things which may be determined about an experiment is, however, 
the a priori probability of the exact result which has been obtained. 
For example, if we toss two dice we determine a certain sum from 
2 to 12, according to the faces which happen to appear, each of 
which sums has a definite probability of appearance, as we saw in 
Problem 3, § 47. The exact relationship is 


Sum eo ae CA Oumar ot) en TO. eb bo et 2 
Probability 38 AG 36 Be 38 36 36 38 30 BE 36 


If, then, our experiment gives us a sum 4, we have a result of which 
the a priori probability was 3’5; while if it gives a sum 12 we have 
something of which the a@ priori probability was only az. 

These numbers measure the unusualness of our result: if they 
are very small, our result is extraordinary, while if they are large it 
is not at all so. If we were to make a large number of trials, we 
could find the “‘ average unusualness”’ of our results by averaging 
the @ priori probabilities of the results obtained. 

Similarly, we may compute the “ expectation of the probability,” 
just as we may compute the expectation of any other number, and 
thus find a sort of theoretical measure of the degree to which the 
result of an experiment may be expected to be unusual. 

Let us take as an example the numbers given above. The chance 
of getting a sum 2 is 3's; if we do, the number with which we are 
dealing—the probability—-is also z's. So this event contributes a 
term (g's)? to our expectation. Similarly the chance of getting a 
sum 3 is 3%, which contributes a term (g¢)” to our expectation. In 


200 PROBABILITY AND ITS ENGINEERING USES 


general, any event which has the probability p contributes a term p? 
to the expectation of p; for the number of which we are computing the 
expectation is p, and its probability of occurrence is also p. In our 
particular illustration the sum is therefore 


(aa)? +. (is)? + Gis)? +... + (ois)? + (ea)? = 0.1126. 


In other words, we may expect the experiment to give a result of which 
the probability is a little greater than 5. 

We shall now prove that the expectation of the probability of a com- 
plete set of events is least when the events are equally likely. 

Suppose the events are s in number, and that their respective 
probabilities are pi, p2,..., ps. Then the expectation of their 
probability is 

e(p) = pi? + po? +... + ps’, 


where the p’s must satisfy the condition 
OL Pek eee Paes 


since the set is complete. Solving the latter of these equations for 
ps and substituting the result in the other, we get 


€(p) Din Pah oe wee tae lee a hae eae Paetee 


In this equation the variables are all independent. We may therefore 
find the minimum value of ¢ by differentiation in the usual way. 
Differentiating with respect to each of the variables in turn we get 
expressions of the form 


a = 2p; — 2(1 PE Do es, a Pe 2( pups) 


In order that all these equations may be zero it is necessary that each 
pi be equal to p,: that is, that every p must be equal. This proves 
the theorem. ) 

As an illustration, the eleven sums that may be obtained in 
tossing two dice are not equally likely. We have seen that the 
expectation of their probability is 0.1126, which, according to the 
theorem, should be bigger than if they were equally likely. If they 
were all equally likely, however, the probability of each would be 7; 
and since p could take no other value than this, its expectation must 
also be 77. This is indeed less than 0.1126. 


~ 


i aie ahh gallia 


-§ 80. THE EXPECTATION OF A PROBABILITY 201 


A similar theorem can be proved for the case of continuously dis- 
tributed variables. It reads: The expectation of p(x) is least if x is 
distributed at random. We shall not stop to prove it. 

Instead, we may observe that the sort of computation which 
we have just carried out for p could equally well be carried out for 
some function of p; for any experiment which fixes a value of p 
also fixes the value of any function of p. Later on, in our discussion 
of the Kinetic Theory of Gases, we shall find that the expectation of 
the /ogarithm of p plays an important part, and in anticipation of 
our needs in that connection we may prove the following variation 
upon the theorem which we have just stated: 


The expectation of log p(x) is a minimum if the variable « is 
distributed at random. 


For this proof we make use of (89), the function f(*) being for 
our present purposes log p(x). Our problem, then, is to make 


ellog p(x)] = f2) log p(x) dx 


a minimum, subject to the condition that 


fr de St. 


This is a problem in the Calculus of Variations, and specifically 
a problem of what is known as the “ isoperimetric type.” It would 
carry us too far afield to attempt to explain the theory that underlies 
such problems, so we shall content ourselves with giving a categorical 
statement of the rule to be followed in their solution, and showing 
that the application of that rule leads to the result stated in our 
theorem. é 

The rule is as follows: If it is desired to find a function p(«) which 
will make the integral 


F - (40) dee 


a minimum, subject to the condition that the integral 


G = {e) dx 


202 PROBABILITY AND ITS ENGINEERING USES 


must always be kept constant, it is only necessary to make the 
quantity 


F=2G = { 1f(p) ~ rato) 


a minimum without any restrictive conditions whatever. The A is a 
constant the value of which can usually be found after the form of 
p(x) has been determined. Applied to our problem, this rule requires 
that we make the integral 


I= { pllogp — Nas 


a minimum. 
Suppose, now, that, we had somehow found the solution to the 


problem, and that it was 
p = P(x). 


Since this function makes J a minimum, it follows that any change 
we might make in the form of P(x) must of necessity increase 7. For 
example, if we made a small change to the new function 


p = P(x) + 4(e) 


this would have to be true. But if we substitute this result in J 
we obtain 


fi + 6] [log (P + 6) — A] dx, 
which is approximately equal to 


J Pog P —») d+ flog P +1 rode +2 (FZ otde ton, 


since 6 is very small. 

Now, it is easy to show that unless the quantity by which 64 is 
multiplied in the second integral is zero we can make the entire result 
smaller than the first term; which is absurd, since the first term is a 
minimum by hypothesis. To see this, we remember that 8(x) is a 
purely arbitrary function of x. If we were to choose it so that it is 
negative wherever log P + 1 — 2 is positive, and vice versa, the second 
integral would obviously be negative. And if we were to choose 


§ 80. PROBLEMS 203 


it small enough, the remaining terms would be negligible. The sum 
of all these terms would then be negative. As this is impossible by 
hypothesis, it follows that 


log P =X —1, 


which is a constant. However, if log P(x) is a constant, P(x) must 
also be. That is, the variable x must be distributed at random, 
which proves our theorem. 


PROBLEMS 


1. In § 76, e1(7) was found for the case of independent trials. By 
transformations which are exactly similar to those used there, the 
summations defining e2(7), €3(7), .. . can be reduced to one or more 
terms of the form (15). Find e2(7) and ¢3(7). 


2. Find e1(6), €2(6), €3(6). 


3. Toss a penny ten times and record the number of heads 
appearing. Call it . Repeat the experiment until you have 
50 values of x. With these experimental results find their average, 
the set of deviations d, and the three moments wi(d), w2(d), u3(d). 

(It will save time to take ten coins and toss them at once. The 
number of heads can then be counted after each throw.) 


4. Find the “ expectation of 2,” and the first three expectations 
of 6. The results of Problem 2 will aid you. 


5. Suppose the game described.in Example 45, § 79, is altered so 
that if heads do not appear within ten throws the bank captures the 
stakes, and a new game begins. What is the player’s expectation of 
gain? 


6. Equation (77) is the “Normal Probability Law.” Find the 
“most probable velocity ” in the x-direction. 


4. Find a, the “ standard deviation ” of u. 


8. Suppose « were adopted as the unit of velocity, and denote 
velocities measured in this new system by wv’. Find p(w’). We refer, 
of course, to the velocities in the x-direction only. 


g. Find the first three expectations of 6 under the conditions 
of Problem 8, You should not need to compute the first two, 


204 PROBABILITY AND ITS ENGINEERING USES 


10. In Example 45 the bank has $1,000,000; but instead of paying 
2”~1 dollars if heads appears first on the 7th throw, it pays (1.99)""’. 
What is a fair price per game? What is the fair price if the bank 
pays (1.5)"" dollars? Compare these values with the results obtained 
in §79 upon the assumption that the bank’s wealth was infinite. 


11. Ten dice are tossed together, the experiment being repeated 
fifty times. What is the expectation of the number of times three 


aces appear? 


12. If an experiment produces two numbers a and 4, and if the 
value of a which appears is independent of the value of 4, show 
that the expectation of their product is the product of their expec- 
tations. 


13. The face cards are discarded from two packs, and thereafter 
one card is dealt from each. What is the expectation of the product 
of the numbers appearing on them? 


14. If an experiment produces two numbers a and 4, show that 
e(a + 4) = a (a) + s(d). 


15. The face cards are discarded from a single pack, and then 
two cards are drawn. What is the expectation of the sum of the 
numbers appearing on them? 


16. Show that the expectation of the probability of a continuous 
variable is least when the variable is distributed at random. 


REFERENCES FOR OuTsIDE READING 


1. CootipcE: Probability, pp. 52-73. 

2. ARNE FisHEer: Mathematical Theory of Probabilities, pp. 49-53. 

3. CzuBER: Weahrscheinlichkeitsrechnung, pp. 72-79, 142-146, 
| 239-261. 

4. WuitwortH: Choice and Chance, Chapters IX, X and XI. 


,~ 


a 


CHAPTER VIII 


THe DistrrisuTion Functions Most FREQUENTLY USED 
IN ENGINEERING 


Brot. L ntroductory 


For many years the only distribution function which 
scientists were accustomed to use was the Gaussian or Normal 
Law. This, when written in one variable, as was usually the 
case, was exactly the equation (77). Various derivations of 
it have been given; but none of them is satisfactory in a 
practical sense, for they are all based upon assumptions of 
such a nature that it is quite impossible to judge whether or 
not they are approximated in any actual case. For example, 
it is very frequently assumed that the deviation of any given 
result from expectation is due to the superposition of a very 
large number of contributory causes, no one of which is com- 
parable in magnitude to the combined effects of all the rest, 
and which are just as likely to produce positive as negative 
deviations. It is generally hopeless to attempt to justify the 
use of a formula based upon such hypotheses as these, for we 
do not ordinarily have any clear-cut picture of the causes of 
deviation to begin with, either as to number or as to the 
matter of their tendency to produce opposite effects with 
equal frequency. 

In addition to this purely theoretical objection to such 


‘proofs, there is the further objection that experience has 


taught that very few sets of experimental data appear to follow 
the law —in fact, very few of them are even symmetrical. 
Hence, though it can do us no harm to know how far men 


have been able to go in putting a foundation under an ancient 


monument, we shall be wise not to overlook the fact that it 
205 


206 PROBABILITY AND ITS ENGINEERING USES 


is, in a sense at least, venerable principally for its age. It 
has its uses, but it is not divinely ordained for the cure of all 
statistical woes. 

Other distribution functions which have to do with con- 
tinuously varying quantities are in no better logical standing. . 
It is only in the case of quantities which, in the nature of 
things, can take only discrete values (such, for example, as 
the number of objects possessing a given property) that any- 
thing approaching practically applicable hypotheses have 
been found. r 

It is the purpose of the present chapter to segregate these 
two classes of distribution functions, and so far as is possible 
without too tedious theorizing to show under what conditions 
the individual functions may be expected to be applicable. 
They may then serve as guides in the discussion of such data 
as is actually met in practice, especially in cases where the 
attempt to find the true law by an independent investigation 
appears to be entirely hopeless. In other cases such an 
attempt should undoubtedly be made, provided the problem 
is important enough to justify the labor. It is the only sure 
way. 


§ 82. Distribution Functions for Discrete Variables; The Bino- 
mial Law and Various Approximations to It 


The first law of any general consequence which we met in 
the course of our studies was 


End t) = Cg DAD) (23) 


As we know, it represents the probability of 7 successes in 
m independent trials, if the chance of success in.a single trial 
is p 

This law is exact, and not only that, we know pretty well 
what the conditions underlying it are. It is true, I suppose, 
that there are comparatively few practical situations in which* 
the same essential conditions can be maintained for a great 


sO a 


a 


SNR ee 


~ 


§ 82. APPROXIMATIONS TO THE BINOMIAL LAW 207 


length of time; but there are many problems in which condi- 
tions approximate stability to such an extent that we feel 
no hesitancy in dealing with them on this basis. For example, 
take the production of stamped parts made on a punch press. 
The die in use is certain to wear, and thus produce a progressive 
tendency of some sort; but the trend will usually be slow 
enough that, over a sufficiently limited portion of the life of 
the die, this trend may be ignored. So, too, with many other 
features of the process: sheets differ somewhat in thickness, 
temperatures change, and so on to a great number of factors. 
Yet if we sort the product into two sorts, “ bad” and “ good,” 
the various pieces have something like the same chance of 
being good. 

The Binomial Law, therefore, is one of very broad utility. 
Its chief objectionable feature is the difficulty of computing it, 
particularly when the answer desired is the probability of 
exceeding n instead of equalling it, in which case a large number 
of terms might have to be calculated and added together, if 
mislarge. There are, however, fairly accurate approximations 
which can be used in such cases, the foundations for deriving 
which have been laid in §§ 42 and 43. We shall now complete 
the proof. 

To get a mental picture of what the proof is to contain 
we refer baek once more to the discussion of Bernoulli’s 
Theorem, and in particular to the accompanying Figs. 8 and 9. 
These figures were drawn for the particular value p = 4, 
and present the distribution functions for m successes in 
exactly m trials for several values of m. We saw that as the 
number of trials is increased the distribution function becomes 
flatter and flatter and spreads out more and more widely 
along the v-axis (Fig. 8), but that when plotted against n/m 
as in Fig. g it becomes higher and spreads less widely as m 


-increases. If we carry this process one step further and plot 


the distribution function in terms of 2/+/m, we obtain the 
set of curves shown in Fig. 23. It is obvious that they are 
very similar to one another, and seem to approach a definite 
limit as m becomes infinite. It ought to be possible, therefore, 


‘208 PROBABILITY AND ITS ENGINEERING USES 


to find some sort of smooth curve which would be a good 
approximation, at least when m is very large. It should be 
noted, however, that in order to get the degree of similarity 
exhibited by Fig. 23 it has been necessary to shift the curves 


Fic. 23.—An Atrernative Form or Fic. 7. 


so as to cause their maxima to fall vertically above one 
another. 

We begin by replacing all the factorials in (23) by their 
Stirling approximations. The result is 


/ 


Pe SS 


§ 82. APPROXIMATIGNS TO THE BINOMIAL LAW 209 


where /(m, 2) is written briefly for the series terms 


I I 
(++ 288m2 ! wit ) 


f I I I 
(: raat sea ps, J(: : aes al Te a s5 5) 


which come from (41). It is equivalent to 


I m2 — mn+ n? 
12 mn(m — n) 


[xy Bh) Smee (99) 

Next we make provision for the change of scale. The 
correct substitution must shift the maxima all to the same 
point (which is best done by shifting them to the origin). 
This is accomplished by introducing a new variable 


6 =n — pm. 


As pm is the “expectation of 7,” 6 is the deviation from expec- 
tation. Next we collapse the scale by setting 
6 n= Din 
ee Ser ee 
Vim Vim 
Upon substituting this new variable in (98) and making a 
few obvious rearrangements, we get 


Coe aes ete 
Pat) = LD (14) 
x —(1—p)m4+2Vm—} 
x —— . (100) 


We now take the two brackets in (100) and treat them by 
the same process as was used in § 43. If we call their product 
Z, we have 


- 
log Z =— (pm + «Vin + 3) log (1 +) 


se nw 
— [Qi — p)m — xVm + §] log (: — ae) 


210 PROBABILITY AND ITS ENGINEERING USES 


Upon expanding the logarithms in series and collecting like 
powers of m, 
wees. 
ap(i — p) 
: (ee To pee eee 
Vn Php. 1) 28 pip =A) een 
ae ee Di Pe ee ee 2) G: 
pep =e Fae OP Pa) aan ie 
whence Z itself is equal to e. raised to this power. From this 
exponential we sort out the term e~*/??"~”, which is inde- 
pendent of m, and then expand the remainder in a series. 
The result is 


Z =o] + = : (2p — pli =) 
Vmp(t — p) ey 20 Seni 1/6 
+( pon ie ee 
/ mpi — p) 8 PG e eo) mee) 


Ape 8p?—8p+ 
+ 52 —p)2 ae s) (op—1)( SPs * a8 
E94P 94 Po 37 a OOP eee 
p2 ap) 240 Pr 1p) sine 
(2p — 1)? x9 ) 
PS es ath 
Next, the 7 in f(m, 2) must be replaced by the new variable 
x, which gives 


log Z =— 


ir 


kee Noy ereee w(I = 2p) 

f(m,n) =1 re Se 21 aye: 
It now becomes obvious that y = «/V p(1 — p) would be 
a better variable to use, so we make ae substitution. As 
Vmp(i — p) =a is the standard deviation of the Binomial 


Law, this new variable satisfies the relation 


rt Abe (101) 


WV mp (1 — Pp) o 


rm 


Le eS NN 


§ 82. APPROXIMATIONS TO THE BINOMIAL LAW art 


and is therefore the deviation measured in terms of the standard 
deviation as a unit. We make this substitution in the series 
for Z and f, and multiply them together, thus getting the result 


To “ 


i ees OP PS 2 7p 7p? 
=+( fo 2 2 at atta 


12 


2p — 1)? a ee) 
— GP=2" 0) + Leap — 1 oa Py 


ema 4 
ad ocak PI Ag 13 7TO AD 940" ye 
144 - 240 
= 2 ID — 2 
(a3 1op + lop es ee y0) — ye il (102) 
144 : 72000 3 


The terms which follow are even more complicated. 

This distribution function still gives Pn(z), for we have 
not multiplied it by the Jacobian of the transformation by 
which we went from 7 to y. The result of such a change 
would be toreplace V 22 0 by 27: nothing else would change. 
This is easily seen from the fact that we have compressed our 
curve by just the factor 1/c, and to keep areas unaltered 
we would need to multiply all ordinates by o. 

Now let us see to what result all this tedious algebra has 
led us: So long as y#/o is small, (102) gives a good approxima- 
tion to the Binomial Law. Indeed, if y? is small enough 
— which means that we confine our attention to the portion 
of the curve near the peak — the first term of (102) is good 
enough. But this term, written for Pn(y) instead of Pn(2), is 


Paly) = Fat? (103) 


which is just the Normal Law. 
Moreover, when dealing with the special case p = 3, the 


212 PROBABILITY AND ITS ENGINEERING USES 


Both Approximations 


Coincide? 


© = Binomial Law 


0.05 


10 15 20 


Fic. 24.—APPROXIMATIONS TO THE BrnomiaL Law. 


Normal 
Approximation 


© = Binomial Law 


Fic. 25.—APPROXIMATIONS TO THE Binom1AL Law. 


§ 82. APPROXIMATIONS TO THE BINOMIAL LAW oh) 


odd terms in (102) vanish because of the presence of the 
factor (29 — 1). Hence: 


The Normal Law (103) is a fair approximation to the Binomial 
Law so long as y3/o is not too large. In the special case of 
p = % it is somewhat better than otherwise. In the vicinity of 
the “tails” — that is, when the deviations are large—it is never 


satisfactory. 


A statement of this sort is likely to be rather vague unless 
it is emphasized by ‘means of some sort of graphical presenta- 
tion. With this in mind Fig. 24, which corresponds to m = 36 
and p = 4, and Fig. 25, which corresponds to the same m 
but to p = 75, are presented. In each case the circles repre- 
sent the exact values of the Binomial Law; they occur, of 
course, only at integral values of x. The curves represent 
our approximations to these values. 

Dealing first with the symmetrical case of Fig. 24, we 
note that the circles and curve coincide absolutely, so far as 
is possible to judge from the main portion of the drawing. 
What is more, the rough approximation (103) and the more 
exact approximation (102) are so nearly coincident that it 
has been entirely impossible to represent them separately. 
This would be true right out to 7 = 36 if we were to continue 
to use the same scale. But if we magnify the ends of the 
curve, as has been done at the right-hand margin of the draw- 
ing, we find that they do indeed separate, and that the higher 
approximation represents the true values very much better 
than does the Normal Law (103). For example, at 2 = 30 
(the extreme edge of the drawing) the Normal Law is in error 
by more than 50 per cent, whereas the higher approximation 
is still indistinguishable from the true value on the scale of 
the drawing. 

Next, turning to Fig. 25, we find that the Normal Law 
nowhere represents the Binomial with any great ia of 
exactness, whereas the higher approximation coincid€s very 
well throughout the entire range. 


214 PROBABILITY AND ITS ENGINEERING -USES 


§ 83. Distribution Functions for Discrete Variables; The Poisson 
Law as a Limiting Case of the Binomial Law 


The second important distribution function in the case 
of discrete variables is the Poisson Law. It is usually regarded 
as an approximate form of the Binomial Law when the number 
of trials m is very large and the probability p very small; and 
as its derivation from this point of view is the simplest, we 
shall begin our discussion with it, though we shall shortly 
see there is another point of view which is of much greater 
practical importance. 

We have found in $77 that the expectation of 7 in the 
case of the Binomial Law is « = mp. Suppose, then, that we 
replace the p in (23) by e/m, thus causing the formula to take 


the form 
P,(n) = Cm (<)'( re ay (104) 


™m 


By writing out the binomial coefficient and rearranging the 
factors slightly, this expression can be thrown into the alterna- 
tive form 


pate) = [(1-4)(1- 2)... (0-7) | 
Alger kee! |. 


Now, remembering that we are dealing with a case in 
which p is supposed to be very small, it is obvious that only 
those values of 7 are of consequence which are very small 
compared to m. Hence every one of the group.of terms 
enclosed in the first set of brackets is of just about unit magni- 
tude. The same is true also of the quantity 1 — e/m which 
occurs in the second and third brackets, for e/m, or p, 
is very small. Hence it follows, since there are compara- 
tively few of these terms in the first two brackets, that 
their product is also not greatly different from unity. 

In the case of the third bracket, however, this argument 
cannot be applied; for the power to which the quantity 


§ 83. THE POISSON LAW ars 


1 — ¢/m is raised is not a moderate one, but instead is very 
large. We have seen in § 43, however, that an expression 
of this form is approximately equal to e~‘, so that we are 
justified in concluding that (104) is equivalent to 


ee 


Bae 
n\~ 


Pr(n) = 


(105) 


Just how good this approximation is depends upon the 
values of m, n, and e; of course whatever it is, we could readily 
improve it to any degree we might desire by the use of processes 
similar to those in §82. The result, however, appears to be 
of little value and probably does not warrant its presentation. 

The consequential thing is, that if p is small enough and m 
large enough, the Binomial Law reduces approximately to the 
form (105), which is exactly the Poisson Law. That these 
conditions are sometimes satisfied with sufficient approxima- 
tion to warrant the use of the simpler law we may. easily 
show by citing a particular example which, because of its 
unusual subject matter, has become classical. 

Certain army records, extending over a period of years, 
give among other things the number of soldiers killed by the 
kick of horses. The classified results are shown in Table XV. 
The numbers in the first column represent the number of 
soldiers killed in this way in one corps during one year, and 
the second column tells how often this record was repeated 
during the period covered by the data. 

Now there are a large number of days in a year, and the 
chance of a fatality occurring during any one of these days 
is pretty small. What is more, each day is a sort of inde- 
pendent “trial”; so there is some reason for expecting the 
data of Table XV to follow (105) rather closely. To check 
this supposition there are given, in the third column of the 
table, the number of times the various records would be 
expected to have occurred if the distribution accurately obeyed 
the Poisson Law with an expectation of « = 0.61. We have 
at present no better means of checking the agreement of the 
second and third columns than the mere observation that 


216 PROBABILITY AND ITS ENGINEERING USES 


there does not appear to be any serious disagreement between 
them. Later on, in Chapter IX, when we have developed a 
scientific method of measuring this agreement, we shall find 
that it is very good indeed. 


TABLE XV 


Recorps oF SOLDIERS DyING FROM THE 
Kick or Horses 


Number of | Frequency | Frequency 
Deaths Observed Expected 


fo) 109 108.7 
I 65 66.3 
2 2 20.2 
3 3 4-1 
4 I 0.6 
5 ° Or 
6 ° 0.0 


§ 84. Definitions of the Phrases “ Individually at Random” 
and “ Collectively at Random” 


We can make better use of this illustration, however, than 
that of merely showing that data sometimes obeys the 
Poisson Law. We can use it as a guide to the conditions under 
which the Poisson Law is exactly, rather than approximately, 
applicable. 

We notice, to begin with, that the times at which the various 
deaths occurred determine points upon the time axis. If we 
consider any one death (that is, the death of a particular man), 
we have no reason to suppose that it occurred at one instant 
rather than another.! That is, it is as likely to fall in one 


1We must, of course, not be hypercritical about this statement. Certain points 
on the time axis correspond to periods when the men were asleep, and it is quite 
unlikely that a death would occur at such an instant. There probably were also 
different routines established for week days and Sundays, and these might affect 
the likelihood of the point lying at one place rather than another. Perhaps, however, 
the fact that what we say about this illustration is not true when viewed in too great 
detail may serve the more clearly to point out the idea we are aiming to convey. 


* 


§ 84. RANDOMNESS 217 


interval of specified length as another; or in the terms of § a 
it is placed on the line “ at random.” 

Then there is the additional fact that the number of deaths 
occurring during a particular interval is not in any way 
influenced by what has happened during other intervals. 
We do not mean by this that there is no sort of connection 
between them whatever, for obviously there exists the con- 
nection implied by the words “ the chance of a certain number 
of deaths in this interval is the same as for any other interval 
of like length.” What we mean can probably best be illus- 
trated by giving an example of the contrary situation. Suppose 
we were told that in a particular year just three men were 
killed, and that in looking over the records for the first six 
months we found that exactly two had been killed during that 
period. Obviously this information does influence our judg- 
ment as to what happened in the other half: from the infor- 
mation at our disposal we can conclude that just one death 
occurred in this half. But if we were the statistician who kept 
the records and at the middle of a year observed a slight 
excess of deaths for the first half, we would not be able to 
conclude anything at all about what would happen during 
the second half. ‘There might be either an excess or a deficit; 
it is all a matter of chance.1 

There are many situations about which entirely similar 
observations might be made. In Chapter X, in dealing with 
Traffic Problems, we shall have need to return to them again 
and again. We therefore frame a pair of definitions which 
shall cover, once for all, the essential ideas with which these 
observations deal: 


1 This statement is also probably not true. An excess observed by a statistician 
at the middle of the year would probably lead — if it were serious enough — to some 
sort of “safety campaign” intended to reduce the number of such accidents. Or, 
viewed from a somewhat closer angle, if the men themselves had immediate knowledge 
of the occurrence of such deaths, they would probably be led to exert greater care 
the day following an accident than at other times, and the probability of a death in 
an interval of a day just following the occurrence of such an accident would be less 


than normal. : 
It is just exactly influences of this kind that we wish to rule out in what follows, 


218 PROBABILITY AND ITS ENGINEERING USES 


A set of points is said to be distributed “ individually at 
random”? along a line segment provided each point of the set 1s 
placed at random, independently of all the rest. 


A set of points is said to be distributed “collectively at random ” 
along a line segment provided the probability of any interval dx 
containing n points is independent of the number of points in any 
interval not wholly or partly included in dx. 


We have already observed that no set of points can be 
- collectively at random if the total number,on the segment is 
fixed in advance, for under those circumstances an excess in 
any one interval reduces the chance of an excess, and increases 
the chance of a deficit, in other similar intervals. This, of 
course, violates the definition of “ collectively at random.” 
It is equally as easy to set up an illustration in which the 
points are “ collectively ” but not“‘individually ” at random. 
Suppose, for example, that a record is kept of the instants — 
at which people pass a certain point on a bridge. Some of 
them will come on foot, some will pass in automobiles, and 
occasionally whole train-loads will pass in subway cars. If 
we idealize the problem to the extent of ignoring the finite 
dimensions of conveyances, we may say that, while some 
persons pass individually, others go by in groups at identical 
times. The time axis therefore comes to carry a set of points 
so arranged that the probability of m persons passing during 
any interval is the same for all intervals.2 Furthermore, the 


1 This use of the term “collectively at random”’ must not be confused with the 
use which has been made of it by Mr. E. C. Molina in connection with variables 
which can take only discrete values. (See, for example, the Bell System Technical 
Journal for November, 1922.) Mr. Molina’s use of the term can be described as 
follows: 

The placing of a point picks out for us some one of the possible values which the 
variable may take. Similarly, the placing of ” points picks out ” values. If any 
such combination of x is equally as likely as any other, the 7 points are said to be 
“collectively at random.” It is implied, of course, that the same value cannot occur 
more than once in such a set of n. 


? The reader will have to approach this illustration in a rather tolerant frame of 
mind and forget for the moment that there are such things as, on the one hand, traffic 
police who cause massing of traffic, and, on the other, subway schedules which produce 


§ 84. RANDOMNESS ae 


knowledge of the number of people who have gone by during 
the last minute (say) has no effect upon the number that 
are likely to go by during the next minute to come. The 
points are therefore arranged “collectively at random.” 
But they are not “individually at random,” for the point 
corresponding to a subway passenger is not placed independ- 
ently of the points corresponding to the other passengers in 
his train, but is coincident with them. 

We thus have illustrations of (a), a.set of points which is 
individually but not collectively at random, and (4), another 
set of points which is collectively but not individually at random. 
It may be well to round out the ideas by giving a further 
illustration of a set which is neither individually nor collectively 
at random. Suppose our bridge traffic were regulated in 
such a way that no two subway trains had a clearance of less 
than one minute. Then it would certainly be true that the 
probability of a train-load of passengers passing within the 
next half minute if a train-load were known to have passed 
within the last half minute is zero. This violates the definition 
of “collectively at random.” As the illustration still violates 
the definition of “individually at random,” the traffic is not 
at random in either of these senses. 

We add one more illustration of a set of points distributed 
both individually and collectively at random; this time it is 
one which is not vitiated by considerations regarding human 
conduct, as was the one considered at the beginning of the 
section, but it has its own drawbacks nevertheless. 

The emission of $-rays! from radioactive substances such 


a certain regularity of dispersion. The illustration is far from being a perfect one, 
but it conveys the essential idea better, perhaps, than an artificial system set up’ for 


the purpose. AB, : 
It is a characteristic of abstract logical ideas never to be met in a pure form in 


actual life. 

1The term “B-rays” is applied to the electrons spontaneously ejected from the 
nucleus of an atom. When one is emitted, the substance transmutes itself into a 
different chemical element, which may have properties quite different from the original 
one.- It may also be radioactive; but as it is no longer the original element, we may 
truthfully say that an atom emits a B-ray only once, 


220 PROBABILITY AND ITS ENGINEERING USES 


as radium appears to be entirely spontaneous: that is, a 
nucleus seems impelled to eject an electron, not by what is 
going on around it, but by some inner “ predisposition ’”’— 
if the word can be allowed — which we do not understand. 
If we choose any atom and watch it for an interval dt there 
is a certain probability of it emitting an electron during this 
interval. As that probability is the same no matter what 
we may know about ofher atoms, and no matter what interval 
we choose, the emissions are “individually at random.” 

If we watch a group of m atoms, the probability of x 
emissions during dt is absolutely independent of what may 
have happened during any preceding time,! for, as we have 
said, the emission of 6-rays~is not affected by events outside 
the nucleus. 

Hence the points marked on the time axis by the successive 
emissions are both individually and-collectively at random. 


§ 85. Second Demonstration of the Poisson Law 


We have seen that it is possible to have sets of points 
with both of the attributes of randomness defined in § 84, 
or with neither, or with either one but not the other.2 We 
now purpose to show that any set which possesses both these 
properties is distributed according to Poisson’s Law. Specific- 
ally, we shall prove the following theorem: 


If a set of points 1s distributed individually and collectively 


1 If we chose a given sample of our substance and watched ‘it, the atoms of which 
it is composed would gradually transmute themselves into something else. Hence, 
as time went on, the number under observation would decrease. We shall find this 
worth thinking about a little later. For the moment we may suppose that a new 
atom is somehow fed into the group under observation whenever one leaves it by 
transmutation. 


It is even possible to have a set of points covering a total interval (a,c), but 
satisfying different conditions within the two parts (a, 4) and (4,c). A simple illus- 
tration would result (if @, 6 and ¢ are instants of time) from marking upon the time 
axis the instants at which B-ray emission took place from a sample of substance, but 
removing part of the sample at the instant 4, 


§ 85. THE POISSON LAW 221 


at random in the interval (a, b), the probability of n points lying 
within any subinterval of length x is 


(kx)" pees 


n!\ 2 


PH, se) = 


where kx 1s the expected number of points within the subinterval. 
The proof will consist of four parts: 


1. That the probability of just one point in an infinitesimal 
interval dx is an infinitesimal of the same order as dx, while the 
probability of more than one point in such an interval is an 
infinitesimal of higher order. 


2. That the probability of 7 points in an interval of length 
x, when regarded as a function of x, is differentiable. 


3. That it has the value stated in the theorem. 
4. That kx is the expected number of points in the length w. 


1. By definition, the probability that an interval contains 
m points is the same for all intervals of like length, independ- 
ently of any information we may have concerning other 
intervals external to them. ‘This is true of 2 = 0, as of every 
other value of . 

If we divide the interval w~ into a number of elements of 
length dx, x will have no points in it if, and only if, the same 
is true of every one of its subintervals. Hence, the prob- 
ability that « is without a point is equal to the probability 
that every subinterval is without a point. 

Let us denote by P(o, x) the probability that there are no 
points in x; by P(o,dx) the probability that there are no 
points in dx; and by P(> 0,dx), the probability of one or 
more points in dx. Then 


P(o, dx) = 1 — P(> 0, dx) 
and, since there are «/dx elements in x 
P(o, x) = [P(o, dx)]”” = [1 — P(> 0, dx). 


This equation only states in symbols what was said in words 
above: that the interval ~ is only without points when the 
same is true of every dx. 


222 PROBABILITY AND ITS ENGINEERING USES 
Taking logarithms of both sides we obtain 


log P(o, x) = = log [1 — P(> 0, dx)] 


ES dx) 4 [P(>0, dx)]? 
ey dx 2 dx 


Boy) (106) 


We may consider three possibilities: 

(2) As dx approaches zero P(> 0, dx)/dx may approach 
zero. If so, all the remaining terms on the right-hand side 
of the equation also vanish, which leads to the conclusion 
that 


Oy; 


log P(o, x) 
or 


FXG, x) aaiiky 


independently of the value of x. This would carry with it 
the conclusion that, no matter how long the interval might 
be, the probability of its containing a point would be zero. 
This could only happen provided the points were infinitely 
far apart. This, of course, is unsatisfactory. 

(0) Next suppose P(> 0, dx)/dx approaches infinity. Then 
every term in the right-hand member of (106) is infinite, 
whence P(o,x) is zero. This means that no matter how 
small the interval « may be, it is certain to include at least 
one point. This also is unsatisfactory. 

(c) The final possibility is that P(> 0, dx)/dx approaches 
a limit k which is neither zero nor infinity. In this case 
P(>0,dx) approaches kdx, its higher powers approach 
higher powers of dx, and all terms except the first vanish in the 
series (106). This gives 


log P(o, *) =— kx, 
or 
“P(o, x) = e™. 


‘We know the series to be uniformly convergent for small enough values of 
P(> 0, dx), Hence we are justified in saying the entire series approaches zero because 
each term does. 


§ 85. THE POISSON LAW 223 


Of the three alternatives, the last is the only one which 
allows a reasonable value for P(o,). It must therefore be the 
correct one. ‘That is, the probability of one or more points in 
the element dx must be an infinitesimal of the first order in dx. 

This, however, is not quite what we set out to prove; for 
it does not necessarily follow that the chance of more than one 
is an infinitesimal of higher order, nor even that P(1, dx) itself is 
of the first order. The conclusion is merely that af /east one 
of the P’s vanishes to the first order with dx, and that the 
rest are infinitesimals of at least as high order. If, however, 
we can show that the probability of two or more points in an 
interval dx is an infinitesimal of higher order than the prob- 
ability of one or more, we will have completed the proof that 
P(1, dx), and it only, is of the first order in dx. 

To prove this is quite simple. For the event “ an interval 
contains more than one point ”’ is the logical equivalent of the 
two events “it contains at least one point’ and “it contains 
at least one other.” The probabilities of these three events 
are represented in our scheme of symbolism by the expressions 
at a), (> 0, dx).and 7d ,(> 1,.dx), tespectively; the 
last one being the conditional probability of more than one 
point in an interval in which there is one or more. We have 
at once, therefore, by the use of (20) 


PS aa die Fie deel lady aN) 
or 


P(t. ax) 
Pol > tax) °= Piscos da 


Now let an interval of length dx be chosen about one of 
the points. The chance that it contains other points is just 
P,,(> 1, dx). Let us, then, take the limit of this probability 
as the length of the interval dx vanishes. The limit is obviously 
the probability of a second point coinciding with the first, 
which we have already seen to be zero for points which are 
placed individually at random. Hence P,,(> 1, dx) must 
approach zero with dx. 


224 PROBABILITY AND ITS ENGINEERING USES 
Finally, we notice that © 


Pi, dx) =P ojfdx)— PC ea) 
= P(> o, dx)[1 — P,,(> 1, dx)] 


whence we conclude that, in the limit, 
Pi, ax) = Lc, an eaiegn. 


This completes our proof of part (1). 

2. The second point which we are to prove is that P(x, x) 
is differentiable with respect to x. To do this we consider 
two adjacent intervals, one of length x and the other of length 
dx, the two together constituting an interval of length * + dx 
when we wish to think of them in combination. Now it is 
obvious that the compound interval * + dx can only contain 
mn points provided these are distributed between the two 
component intervals in some one of the combinations 


nm in.x-and 0 in dx, 
m—t1inwx and tI in dx, 


nm —2in«x and 2 in dx, 


O in x and # in dx. 
As the probability of a specified number of points lying in 
either of these separate intervals is independent of the condi- 


tion of its neighbor, we conclude that 


Pant dn)s= Piagny, PlOvdx) 
+ P(n — I, x) PU, dx) +...+ Plo, x) P(x, dx). 


However, P(o, dx) must satisfy the relation 


P(o, dw) = 1 — P(t, dx) — P(2,dx) —.... 


§ 85. THE POISSON LAW 225 


Substituting this value in the preceding equation and rearrang- 
ing it slightly we obtain 


PU dx) — P(n, 
EE) A PO) 8 Pen — 1,2) — Plo, | PB 


+ [P(m — 2, x) — Pon, 4 


We already know that [P(1, dx)]/dx approaches a positive 
number k as dx approaches zero, and also that [P(2, dx)]/dx, 
[P(3, dx)]/dx,..., all approach zero. Hence the entire expres- 
sion on the right-hand side of the equation approaches a 
definite limit: the same must therefore be true of the left side 
also. As this is the necessary and sufficient condition for the 
differentiability of P(x, «), we have completed our proof. 

But we not only know that the derivative exists, we also 
know its value; for in the limit we have} 


ow = k[P(n — 1) — P(n)). (107) 


3. We are now ready for the third point of our proof; 
that is, for deriving formule for P(m). Equation (107) is a 
linear differential equation; ? its solution is therefore 


iL 


P(n) =ene™ +k ee. e” P(n — 1) dx. (108) 


1 We have no further use for the symbols containing dx. Hence we run no risk 
of confusion in writing P() instead of P(n, x). 


2 Any differential equation of the form 
d 
+ file) y = fale) 
dx 


is a “linear differential equation of the first order.” Equations of this kind can 
always be solved, the general formula for the solution being 


= fi (2) da ae [is (x) dz 
y=e ~? orf S po) de, 
0 


where C is the constant of integration. How this solution is obtained does not concern 
us here. Its correctness can be established by substituting it in the original equation. 


In our application of it, y = P(n), fiw) = & and f2(~) = kP(n—1). Hence 
x 
if Silx) dx = kx, and (108) follows at once. 
0 


226 PROBABILITY AND ITS ENGINEERING USES 


Here c, is the constant of integration, which must be 
determined in accordance with known conditions of the 
problem. We have seen, however, that when 2 #0, P(n) 
vanishes as x becomes zero. But when x is zero, the limits 
to the sign of integration become equal, and the integral 
itself vanishes. Therefore all terms in (108) except cne ™ 
are known to vanish, whence it follows that the equation can 
only be true provided cp also vanishes. 

We can now find the P’s one by one. ©. From the first part 
of our proof we know that 


P(e)s=e""*. 
Setting 7 equal to 1, 2 and 3 in succession in (108) we obtain 


PO)n= hie 


PQ) ee 
2G) 


This suggests the law 


P(n) = Meee (109) 


We can establish the truth of this guess by showing that, 
if it is true for one value, 2 — 1, it must be true for the next 
higher value, 7, also. Upon substituting 

Chee 


Piun-—1)= eat oot 


in (108) and integrating, the result (109) is obtained. It 
follows, therefore, that if the law is true for any value of n 
whatever, it must be true for the next higher value also. 
It is true, however, when 7 equals 1, 2 or 3. Hence it must 
be true for every positive integer. 

4. The final point in our discussion is to show that &x is 


§ 86. APPLICATIONS OF THE POISSON LAW 27 


the expectation of the number of points in length x. Bv 
definition, this expectation is 


a(z) = 3 n P(n) 
s (kn)" 


cS A 
n=0 (n — z)! 


—kz 


(kx)? 


91 


= kx e~™ [1 + (kx) + 


ae TOG e 


The series in brackets is recognized at once as the expansion 
of e*. Hence e(”) = kx. 

This completes our proof, for since kx is the expectation of 
n, (109) does indeed reduce to the form (105). 


§ 86. Discussion of the Poisson Law; Problems to Which It is 
an Appropriate Solution 


We have now derived Poisson’s Law in two ways, the 


essential differences between which should not escape notice. 


In the first place, according to the method used in § 83 the 
law was merely an approximation which could be safely used 
provided we knew that the data were distributed in accordance 
with the Binomial Law, and also that m was very large com- 
pared with «. The second method of proof obtains the 
formula as an exact solution, not an approximation. It says 
nothing whatever about the magnitudes of the numbers x 
and 7, and requires no knowledge as to the number of points 
within any interval large or small. Instead, it lays down 
certain general assumptions as to the nature of our knowledge 
—or, rather, lack of knowledge— concerning the way in 
which these points follow one after another, asserting: (1) that 
the probability of one or more points within a definite interval 
is not influenced by any knowledge we may have concerning 
the states of. other intervals; and (2) that each point in any 
such interval lies at random, independently of all the rest. 
The second method of derivation is frequently a great 
comfort, as we can see if we consider the incidence of telephone 


228 PROBABILITY AND ITS ENGINEERING USES 


calls in a telephone exchange. These calls certainly do not fall 
collectively at random for any great length of time; for the 
probability of a large number of calls within a minute at three 
o’clock in the morning is much smaller than in a similar 
interval at three o’clock in the afternoon. If the incidence 
of each call were plotted upon a time axis covering an entire 
day, there would be certain periods in which the points were 
very dense, and other periods in which they were sparsely 
scattered. If, however, we choose a period of a quarter of an 
hour, say, from that part of the day when the traffic appears to 
be heaviest, or from any other portion except those in which 
there is a very definite tendency toward change of density, 
it will be approximately true that any small subinterval has the 
same likelihood of containing 7 points as has any other interval 
of like length. Furthermore, there is very slight dependence 
among the individual calls. Throughout this quarter of an 
hour, therefore, the distribution of calls 1s approximately at 
random, both individually and collectively. We conclude 
that Poisson’s Formula may be applied fo any time interval 
whatsoever lying wholly within the quarter of an hour, or even 
to the quarter of an hour itself. 

This we can infer from our second method of proof. _ From 
the first method, on the other hand, we could only infer that 
the Poisson Law could be applied to an interval which was 
of sufficiently short duration compared with the quarter of 
an hour,! and we would have no criterion for determining 
what the words “ sufficiently short’ mean. 

There are many problems of this general type. We have 
already mentioned the emission of 6-rays from a radioactive 
substance. This is probably the best example in physics, 
because of the apparently complete independence of the 
emissions one from another. But there are many others to 


‘If there were known to be exactly m points in the quarter of an hour, and if they 
were distributed individually at random, the chance of any one of them lying in an 
interval of length * would be p = x/15, if we measure time in minutes. The chance 
of just 7 in this interval would then be given exactly by the Binomial Law, to which 
the Poisson is known to be an approximation only when p is smail. 


§ 86. APPLICATIONS OF THE POISSON LAW 229 


which the formula applies in much the same general sense 
as to the problem of telephone calls. We name only a few 
typical ones: 

The electrons emitted from a hot metal (thermions) or from 
a photo-sensitive substance under the influence of light (photo- 
electrons) probably emerge with sufficient independence to 
make the Poisson Law an excellent approximation when applied 
to the number appearing within a given interval of time. 
The number of line surges in a power transmission system 
because of the throwing of switches undoubtedly falls into the 
same class, and the number of bursts of static in radio recep- 
tion probably does. The same is true of demands for service 
in general, whether upon the cashier of a department store, 
the stock ‘clerk of a warehouse, or any similar functionary, 
unless regularity is artificially injected into the system. 

Hence it comes about that the Poisson Law is the funda- 
mental basis upon which are solved most of those problems 
which demand to know the number of persons, or the quantity 
of apparatus, which will be needed to perform a given service. 
The number of operators in a telephone exchange, or the 
number of turnstiles in a subway station, are excellent illus- 
trations. We have not yet arrived at a point where it seems 
wise to undertake the exact discussion of such problems,! 
but we can with profit consider a very simple one, which, in 
spite of its simplicity, is so similar to many more practical 
ones as to aid us in orienting ourselves. 

ExampLe 47.—A retail chain store, with limited storage facilities, 
sells on the average 10 boxes of dog-biscuit per week. The usual practice 
is to stock up Monday morning. How many packages should be 
adopted as the standard Monday morning stock, in order not to lose 
more than one sale out of a hundred ? 

If each week is begun with 7 packages in stock, no sales 
will be lost unless the demand exceeds » during the week. 


The chance of a demand for 7 packages, however, is 


1 They will be the subject of Chapter X. 


230 PROBABILITY AND ITS ENGINEERING USES 


for this is exactly the chance of 7 points (buyers) in unit length 
(a week) when the expectation for that length 1s Io. 
The expected number of lost sales per week is 


c= Bj — 2) Pl), 


while the expected number of purchasers is 10. If, then, we 
were to keep records for a large number of weeks, the number 
of purchasers would not differ much from-tom, nor the number 
of lost sales from em. It follows that, in the long run, the 
proportion of lost sales would be very close to ? 

E 


Say Gy ara: 


Loo Po yt 


1This step in our solution presents an excellent chance for error; for there is 
a treacherous difference between the “‘ expectation of the proportion of sales lost in 
the long run” and the “ expectation of the proportion of sales lost per week.” 

The best way to see this, perhaps, is by means of a numerical illustration. Suppose 
that the distribution of customers, instead of following the Poisson Law, were of such 
a nature that-either 8 or 12 appeared each week, one number being equally as likely 
as the other. Suppose a Monday morning stock of 11 packages were adopted as 
standard. Then half the weeks would show no lost sales, and half would show one 
sale lost out of 12. The expectation of the proportion of sales lost per week would 
therefore be 


Bg t+3-° = ge. 
On the other hand, in the long run, there would be 10m prospective customets 
in m weeks, and m/2 lost sales; so that the proportion of sales lost in the long run 
would be 
m I 
—+i0om =—. 
2 20 

The two answers are obviously not the same. 

The difficulty lies, not in computing the correct answer, for either computation 
is very simple, but in knowing exactly what it is that we are trying to compute. In 
the present problem “losing one sale out of a hundred” means that, if we were to 
keep records over a very long time, the number of lost sales should be about 1 per cent 
of the number of possible sales, and has nothing whatever to do with the average of 
the proportions lost during the individual weeks. If, however, we were interested 
in the latter, we would proceed as follows: 

If just j customers appear, the proportion of lost sales is (7 — m)/j. The chance 
of this occurring is P(j). Hence the expectation is 

aaate 
= Pe RO): 
j=n J 
This formula corresponds to the yz in our simpler illustration, whereas the one given 
in the text corresponds to the 3/5. 


ae 


§ 86. APPLICATIONS OF THE POISSON LAW 231 


To answer the question proposed in Example 47 it is 
necessary to find the smallest value of 7 for which this expres- 
sion is less than 0.01. This we can best do by a process of 
straightforward computation, but before carrying out the 
arithmetic it is advisable to throw the formula into a slightly 
different form. Obviously our formula for e/1o is equivalent 
to 


€ Ui ee : pies : 

aa See os) Piette (110) 
But 

C} a) =10 j oo 

Ep (ede irae 


Then, writing 7 —1 = 7’, 
4 PUY) =—10 uP’). 


‘Substituting this in (110), and noting that it makes no dif- 
ference whether we call the variable 7 or 7’, we obtain 


ene eee (: = 2) 3 PC). 


The actual computation is shown in Table XVI. The 
second column contains the values of P(z — 1), taken from 
the table of the Poisson Function given in Appendix VI. The 
third column is obtained from the second by starting from the 
bottom and adding the numbers together, each successive 
partial sum being an entry of the third column. The rest 
of the computation is obvious. 

The least safe stock, under the conditions of the problem, 
is 16. : 


232 PROBABILITY AND ITS ENGINEERING USES 


TABLE XVI 

n P(n= 1) - G en nae aa Fe 

Io ©.12511 0.54207 —0.00000 Onl2erT 
II ©.12511 0.41695 —0.04170 0.08341 
12 0.11374 ©. 30322 —o.06064 0.05310 
13 0,094.78 0.20844 —0.06253 0.03225 
14 0.07291 0.13553 —0.0$421 0.01870 
15 0.05208 0.08346 —0.04173 ©.01035 
16 0.03472 0.04874 —0.02927 0.00545 
17 ©.02170 0.02704 —0.01893 0.00277 
18 ©.01276 0.01428 —O.O1142 0.00134 
19 ©.00709 ©.00719 —0.00647 ©.00062 
20 0.00373 0.00345 —0.00345 ©.00028 
a1 0.00187 0.00159 —0.00175 ©.00012 
22 0.00089 ©.00070 —=0:00084 ©.00005 
23 ©.00040 ©.00030 —0.00039 ©.00001 
24 ©.00018 0.00012 —0.00017 ©. 00001 
Mis ©.00007 0.00005 —0.00007 ©.00000 


§ 87. Discussion of the Poisson Law; Variable Traffic Density 
in a Telephone Exchange 


Both the Poisson Law and the mathematical argument by 
means of which we derived it in § 85 are more general than the results 
of that section might lead us to believe. Both, in fact, can be 
applied to traffic in which there is a very definite trend (that is, 
to a distribution of points which tend to pack more densely about 
certain parts of the line than others), provided we know enough 
about the nature of the trend. We may illustrate this by means 
of a problem which is not entirely impractical: the incidence of 
calls in a telephone exchange at a time when the traffic is varying 
rapidly. 

To illustrate exactly what we have in mind, let us think of the 
highly idealized case of an exchange in which there is absolutely no 
traffic whatever before nine o’clock in the morning, and in which the 
traffic density then begins to build up so as to reach a maximum 
at noon. Suppose we know this to be due, not to a tendency of 
individual subscribers to place their calls late in the morning, but 


§ 87. VARIABLE TRAFFIC DENSITY 233 


to the fact that no subscriber arrives at his place of business before 
nine o'clock, and many not until considerably later. Suppose, 
finally, that we have reason to believe that the subscribers arrive 
at a uniform rate, so that the number at work is a linear function 
of the time. Under all these highly artificial conditions, we would 
be justified in the assumption that the chance of a call being made 
during the short interval of time between ¢ and ¢ + dt is proportional 
to the number of subscribers then at work, and therefore also to ¢ 
if nine o’clock is taken as the origin of time. 

Now, it is not our intention to deal with so special acase. Instead, 
we shall be content to assume that the chance of a call coming 
in during the interval between ¢ and ¢ + dt is represented by some 
known function of the time, &(¢) dt, which we may suppose to have 
been arrived at by some such considerations as those given above;! 
and also that the chance of two or more calls in such an interval is 
an infinitesimal of higher order than k(t) dt. We shall find that 
these two assumptions, which are obviously similar to; but very 
much more general than, our notions of “ collectively at random ”’ 
and “individually at random,” respectively, again lead to the 
Poisson Law. 

Let us consider the probability of 7 calls within an interval of 
length 7 beginning at the instant ¢ We denote it by the symbol ? 
p(s, 7,4). It is obvious that if we add a very short interval 
dr to the end of 7, we must have the relation : 


piu,7 + dr, 2) = p(n, 7, 1) plo, dt, t + 7) 
Pi as tt) Ps at fit=2) 
tt 19, 4,4) Pia, dist 7) 
See 


This equation assumes nothing except that the probability of a 
specified number of calls within a particular interval is determined 
solely by the length of the interval and the time at which it begins, 
and not by any knowledge which we may have regarding preceding 
intervals. Furthermore, it is necessarily true that 


DO, dr, + 7) = i=) p(i, dt, fF 1) — pO, aro 4 t) ot 


1 This &(t) dt will play the same part in our present discussion that the k dx played 
in § 85. There & was the expected number of points per unit length (calls per unit 
time, or “calling rate”); here &(f) is the “‘ instantaneous calling rate.” 

2 This should be read “‘ probability of x [calls] within [an interval of length] 7 begin- 


” 
: 


ning at [the instant] ¢ 


234. PROBABILITY AND ITS ENGINEERING USES 


Substituting this in the preceding equation and making certain 
minor rearrangements, it becomes 


=[pla—t; 7, )— p@s 75H) ee 
i 


p(n, t+dr, ft) —p(n, T, t) 
dr 
Hpln—a, 1, )— pla, 7, 9] POE? 
: 
Saat es 


If we let dr approach zero in this expression, p(1, dr, ¢+7) will 
approach k(¢ + 7) dr by hypothesis, for it is the chance of a new call 
during the infinitesimal interval dr beginning at the time ¢+ 7; 
while the analogous expressions for two or more calls will be differ- 
entials of higher order in dr. Thus we obtain 


£ pt, 4) = WO = 14.59) — pn AEH). GD) 


This argument applies to the case of 7 = o as well as any other, 
except that, since negative numbers of calls are absurd, the term 
p(z — 1,7, 4) drops out. We thus have 


“70, Ty t) ary, po, Ty t) k(t Ie Ts 


whence 
po, 1, t)(= t,e7 5, 


K(r, 4) being written briefly for 
K(r, 4) -{ k(¢ + 1) dr. 


Returning to the general equation (111), we note that it, like 
(107), is a linear differential equation, and possesses the solution 


pny sd eee enRee.0 f e*@) a(n — 1, 7, t) R(¢ + 1) dr. 


The argument by means of which we determined our arbitrary 
constants in § 85 is of equal force here, and again leads to the con- 
clusion that every ¢c, is zero except ¢o, which is 1. Hence it is only 
necessary to carry out the ‘successive integrations in order to com- 
plete our solution. Taking first the case of p(1, 7, #) we have 


ACHE) =e ren? k(¢ + 7) dr; 


+ 


§ 88. EMISSION OF BETA RAYS 235 


or, since k dr-= dK, 
pli, 7,4) = 
Similarly, 
p(2, 7,2) = eo p(x, t,t) dK = - , 


or in general, 


| 
a) 
m 
xX 
a 
Il 
x 
| 
m 


K" Es 
pla, Th) = Sir (112) 
Formally, at least, this is the solution of our problem. 


In this formula K has a very simple interpretation, similar to 
the interpretation of kv in (109). There kx meant the “ expected 
number of points in distance x,” or, in the terms of our present 
example, the “expected number of calls in time «.” Here also K 
is the “ expected number of calls in the interval 7”; for it is easy to 
show that 

DP tah eK 


It is obvious, then, that the Poisson Law has a much wider 
field of usefulness than would be inferred from the discussion of § 85. 
We must be careful not to swing to the opposite extreme, however, 
and conclude that it is universally applicable to everything which, 
in the loose sense in which the phrase is applied in every day speech, 
occurs “at random.” In order to illustrate the type of situation to 
which it does zot apply, we consider still another example. 


§ 88. Discussion of the Poisson Law; The General Problem of 
B-ray Emission 


In a footnote in § 84 we called attention to the fact that as a 
substance emits #-rays it transmutes itself into a new substance, 
and thus reduces the amount of the old substance in the sample 
under observation. This is usually expressed in physics by the 
assertion that the substance “ decays.” We now solve the following 


example: 


Exampie 48.—If the chance of a molecule transmuting itself 
within a time element dt is kn(t) dt, where n(t) is the number of untrans- 
‘muted molecules, and if there were N molecules present at time t = 0, 
what is the chance of exactly j transmutations between t =¢t and 


t=t+r? 


1 This law tacitly assumes the products of decay to be non-radioactive, 


236 PROBABILITY AND ITS ENGINEERING USES 


This problem, like the one in § 87, deals with a set of events 
which show a definite trend, but there is this vital difference between 
them: the chance of an event taking place during the interval between 
t and ¢ + dt was determined solely by ¢ in the preceding case, and. 
was entirely independent of the past history of the system, whereas 
in the present case it is determined solely by the past history of the 
system, and (except in so far as the past history is itself different 
at different instants) is not a function of ¢. For in the present 
instance the chance of an emission in the interval in question is 
proportional to the amount of stuff left, and that amount is com- 
pletely determined by the number of emissions which have already 
taken place. The problem is no harder to solve, however, than was 
Example 47. 

We start by investigating the chance of just n p-rays between 
t=o and ¢=¢. We can do this most easily, because we know 
the number of molecules available for transmutation at both instants, 
N at ¢ = 0, and (since 2 have been emitted) N — ” at ¢ =f. 


Suppose the interval is lengthened to ¢ + dt. Then we have 
p(n, t + dt,o) = p(n, t, o)[1 — (N — n)k dt] 
+ p(x — 1,t,0)(N— 2+ dk dt 
Siig, tee 


the remaining terms being negligible. If we are interested in 2 = 0 
this takes the simpler form 


plo, # + dt,o) = plo,z,0)(1 — Nk dt). 
From these relations we easily derive two linear differential Sane 
tions of which the solutions are ! 


plo, t, 0) a les 
} t 
p(n, t,o) = posi eN-OF n(n — 1,3,0)(N— 2+ 1) kdt. . 


From this latter formal solution explicit formule can easily be 
obtained, one by one, for p(1, ¢, 0), p(2,4,0),.... They are all 
found to satisfy the law 

N -—Nkt 


Lt, Oe Hee (113) 


This, then, is a general formula for an interval beginning at an 


The arbitrary constants have been determined exactly as in § 8 5 and § 87. 
{ 


§89. APPROXIMATIONS TO THE POISSON LAW 237 


instant when the number of molecules is known to be N. What 
about the interval from ¢ to ¢+ 7 about which our problem asked? 
At the time ¢ there must be some number available. The chance 
that the number is N — 7 is given exactly by (113). Jf this number 
is available, the chance of m’ being transmuted within the next 
interval r is obviously obtained by replacing N and v in (113) by 
N — 7 and n’, respectively, and ¢ by r. Hence, making use of the 
principle of alternative compound probabilities, we have 
N-n' 
p(n’, 1,t) = >» G* e7 Nk (e* — 1)" Cee" e7 N-nkr (e* 23 Wes 
n=0 
By noting that CO UG , and making certain other 
obvious rearrangements, this becomes 
N-—n! 
p(n’, 1,t) = one e~NEG+s) (gtr — a7 2a Cs (etn) — ett )2: 
n=0 
The summation is now in the form (15): hence it is equivalent to 
(1 — e& + ek¢+™)N—-2 Some further obvious rearrangements then 


give 
i e7* n’ Tee e N-n' 
esp Ne) a es) ee Ara) 


e 


This is the general solution of the problem. It is not in the form 
of Poisson’s Law, and cannot be reduced to that form, the reason being 
that the chance of a 6-particle being emitted in any interval 1s very 
definitely dependent upon the occurrences in other intervals. 


« 


It is only when N is very large and k (the “rate of decay ”’) 
is very small (so that Nk is of moderate size), that it reduces to 
Poisson’s form. Under these circumstances the number left after 
any physically consequential time ¢ is not, very greatly different from 
N, so that the chance of an emission between ¢ and ¢ + df is virtually 
independent of ¢. 


§ 89. 4n Approximation to the Poisson Law 


It can be shown that under certain circumstances the 
Normal Law is an acceptable approximation to the Poisson 
Law. We begin with (105), but omit the subscript m from our 
symbol, since we are no longer thinking of it as an approximate 
form of the Binomial! Law. We thus get 


ae Cae 
n\ 


P(n) = (115) 


238 PROBABILITY AND ITS ENGINEERING USES 


As usual, we denote the deviation from expectation by 6 and 
the standard deviation by «. A simple algebraic computation 
then shows that o? = e. 

From this point on, our procedure is exactly parallel to that 
carried out in §82. We first replace ” by the new variable 
y = 8/0 = (n — o?)/c, which measures the deviation in terms 
of the standard deviation as a unit. We thus get 


n ea: get 207 


=—e = Pn) == 
An 


é 
(7 aay it 


We next replace the factorial by its Stirling approximation 
and expand the various terms in series. The final result, 
after much tedious algebra, is 


oF jp 2(2-2) 22 ee 
V on o \2 6 o2\42 8 6 72 


ep Ie: FAD Veer 31 eke eel! ) 
a (2 144 a 240 48 ay 1296 


Eee 


ih (Tos SY) 4 ee OLY. a2) ee yl 
£4(5 22 a 1152 4320 sl 960 648 31104 


ay att (116) 


The first term of this series is exactly the same as the first 
term of the series for the:Binomial approximation (102). That 
is, the Normal Law (1203) 1s a satisfactory approximation to 
the Poisson Law when y/o is not too large. 

To see just how satisfactory the approximation is, we 
refer to Figs. 26 and 27, in which the values of the Poisson 


Law are represented by circles, while the Normal Law and the 


more complete approximation (116) are shown as continuous 
curves. In Fig. 26 the value of « is 10, and in Fig. 27 it is 
100. The discrepancies between the true values and those 
given by the Normal Law are considerable in every cese, 
though much more noticeably so for the smaller value ¢ than 
for the larger one; but near the center of the range, where the 


: - 
— - 


§89. APPROXIMATIONS TO THE POISSON LAW 239 


Normal” 
Approximation 


© = Poisson Law 


0 5 10 15 STERN Pr 


© = Porsson Law 


80 90 100 10 120 130” 


Fic. 27.—APPROXIMATIONS TO THE Porsson Law. 


240 PROBABILITY AND ITS ENGINEERING USES 


probabilities are high, the percentage error would not be of 
serious consequence for many purposes. Near the tails, 
however, the percentage error is very large. Therefore the 
approximation should never be used in those regions. The 
complete approximation (116), on the other hand, agrees 
very well with the true values throughout the entire range, 
even in the case of the smaller value of e. 


PROBLEMS 


1. Suppose that telephone calls, each of length 7, occur individ- 
ually and collectively at random, the calling rate being ” per unit 
time. What is the probability that, at the instant ¢, there are 
exactly 7 in progress? 


2. What is the probability that more than 7 are in progress? 


3. What is the chance of ¢wo or more calls in an interval of length 
dt? If every call is followed by a “ danger interval’ — that is, an 
interval of length d¢ within which another call, if it arrives, will 
interfere with the first — what is the probability that a call will be 
interfered with during its danger interval? Why are these answers 
not equal? ' 


4. At the time o an observer begins to note the arrival of calls. 
What is the probability that the frst call arrives between ¢ and ¢ + df? 


5. What is the probability that the interval between a call and 
its next succeeding call lies between ¢ and ¢ + df? 


6. What is the expected time of waiting in Problem 4? The 
expected time interval in Problem 5? 


7. With respect to Problem 6 the following argument can be 
made: The time ¢ = 0, at which the observer enters, must lie in an 
interval between calls. It is just as likely to lie near the beginning 
as the end of the interval. That is, its average position is the middle 
of the interval. Hence the average waiting time of the observer 
will be half the average interval between calls. 

The correct answers to Problem 6 do not satisfy this condition. 
Explain the paradox. 


8. Suppose an exchange is suddenly “‘ cut into service” at the 
height of busy hour traffic. Call the instant ¢ = 0. For negative 
values of ¢ the calling rate is zero. Hence there are no calls in any 
interval. For¢ > o the calling rate is 7 per unit time: hence Poisson’s 


le 


§ go. THE NORMAL LAW 241 


Law applies to any interval lying wholly in this time. But if an 
interval begins at a time ¢ < o and ends at f+ 17> 0, the calling 
rate is not constant. What is the probability of just j calls in such 
an interval? 


g. In Problem 8, assume 7 = 3, r = 1, and draw curves for 
P; (r, t)-covering values of ¢ from — 2 to + 1, for each of the following 
values of 7: 0, 1, 2, 3. Discuss the characteristic features — par- 
ticularly maxima—of these curves. 


10. In Problem 5, what is the most probable length of interval? 
The root mean square length? The expected length? 


11. In the decay exercise of § 88, what amount of substance can 
be expected to remain after a time #? 


12. What proportion of the amount present at time ¢ decays during 
the next second? Note that ¢ does not appear in your answer. 


13. What is the expected time of emission of the first 6-particle? 


§ 90. The Normal Law 


So far in this chapter we have been dealing with two dis- 
tribution laws which appear as the consequences of sets of 
assumptions to which the circumstances of engineering prob- 
lems frequently lead us; or at least from which the circum- 
stances of engineering frequently differ in minor respects, the 
importance of which we can to some extent estimate. We 
now come to the consideration of a totally different class of 
distribution functions — no less important, but justified solely 
by their proven utility and not in any way by theoretical 
appropriateness. 

First in importance among these we must list the Normal 
Law (103). It may seem strange to refer to it in that fashion 
after we have twice assigned it more specific significance — 
once as an exact law under assumptions that at least remotely 
resemble the facts of molecular velocities in a gas, and then 
again as an approximate form of either the Binomial or the 
Poisson Law. But the use of the Normal Law is not confined 
to such cases. In addition, there are many problems which 
involve unknown distribution functions — virtually every case 
of the deviation of manufactured parts from the ideal qualities 


242 PROBABILITY AND ITS ENGINEERING USES 


~ 


which they are designed to have is of this type —and the 
best that we can usually do under such circumstances is to 
choose some function which appears to possess the major 
characteristics that experience has taught us to expect in 
such distributions, and then to arbitrarily fit our data to it. 
It is in such cases that the Normal Law plays the part of a 
purely empirical law. 

Attempts have frequently been made to justify its use in 
this field, but they have rather signally failed to do so on two 
scores: in the first place, experience has shown that most such 
distributions do not conform to even the major characteristics 
of the law; in the second, the assumptions upon which the 
justifications are based are of such an intangible character 
that it is impossible to predict in advance whether or not a 
given scientific situation is one of the appropriate few. Deriva- 
tions based upon postulates which have no concrete physical 
interpretation are of little use to the scientist. We shall 
therefore not concern ourselves with them.! 

Instead, let us write down the principal characteristics of 
the law; for if we cannot determine a priori whether a certain 
set of data ought to conform to it, we can at least check up 
the data, once it has been obtained, to find whether it is of a 
radically different type. The principal characteristics are: 


1. The Normal Law is symmetrical. Negative and positive 
deviations of like magnitude are equally likely to occur. 


2. The Normal Law assigns a finite probability to every 
finite deviation. There are no excluded cases. 


3. In the case of the Normal Law there is just one most, 
probable result, and this is identical with the first expectation 
of the variable. 


11 do not mean to infer that the study of such proofs is necessarily a waste of time: 
I only mean that (unless it contributes to the advancement of the logic or mathe- 
matics per se, with which we are not concerned) it belongs rather to the category of 
scientific recreations than to that of purposeful work. The student who is interested 
in following up the subject will find three excellent, and entirely different, derivations 
in the References for Outside Reading at the end of the chapter. 


§ go. THE NORMAL LAW 243 


Now if a set of data is to be distributed in accordance with 
the Normal Law it must, possess all three of these character- 
istics, and it requires only a moment’s consideration to see 
that the possession of all of them is by no means general. Take, 
for example, the case of the height of men. It is absurd to 
speak of a man of negative height: such a thing is not merely 
very infrequent; it simply cannot occur. Yet if height were 
distributed normally, the second property of the Law would 
assign a finite probability to this absurdity. Or, as a further 
example, we may think of the lengths of telephone conversa- 
tions. Here again negative values are meaningless; but more 
than that, actual experience has shown that the conversation 
of most frequent occurrence is very short indeed, certainly 
much shorter than the average. The distribution function 
which these call-lengths obey must therefore possess neither 
of the three properties.! 

It is an unescapable consequence of these facts that, if 
statistical investigations are to have any degree of generality 
at all, they must deal with other distribution laws as well, 
and in the few remaining sections of this chapter we shall 
present briefly some of the more successful attempts that have 
been made to broaden out the material with which the sta- 
tisticlan may work. But before passing on to this subject, 
we must pause for a moment to consider wherein the importance 
of the Normal Law really lies. We have said that as an exact 
law its demonstrations have been of a sadly impractical 
character, and that as an empirical law its own peculiarities 
are of so special a type that it is seldom obeyed; and it might 
appear that this left no field of usefulness for it whatever. To 
form this conclusion, however, would be grossly unfair, for it 
possesses very considerable virtues nevertheless. 

In the first place, it depends upon a single variable y, and 
is therefore very easy to tabulate. In contrast, the Binomial 


1Jn fact, in this particular case a much better law appears to be of the form 
P(T) = ae~*™ for positive values of T, and P(T) = 0 when T is negative. See 
in this connection a recent paper by Mr. E. C. Molina in the Bell System Technical 


Fournal for July, 1927. 


244 PROBABILITY AND ITS ENGINEERING USES 


Law depends upon two variables m and n, the Poisson Law 
upon ¢ and 7, and the law (25) upon m, n, p and q. And 
this is no mean advantage, for it means that the Normal Law 
is the simplest of all laws to handle in those cases to which 
it does apply. In fact, its superiority in this respect is great 
enough to warrant its use in many places where a better 
approximation to the true distribution could undoubtedly be 
obtained, but where a rough estimate easily gotten is preferable 
to a more exact one which demands far more labor. 

In the second place, as we shall see in our discussion of the 
Gram-Charlier series, it is possible to express many distribu- 
tions which do not possess the three properties listed above 
in terms of the Normal Law.and its successive derivatives; 
and since the derivatives have also been tabulated, this often 
affords a feasible means of securing greater exactitude. 

In the third place, we have shown that both the Binomial 
and the Poisson Laws, under suitable conditions, approach the 
Normal Law asalimit. In § 101 we shall develop an analogous 
theorem of much greater generality; and it has, in fact, been 
shown that a very wide variety of distributions have the 
Normal Law as a sort of common limit to which they all tend. 
This property is probably the most important one of all, for 
though it may be of little use to us in actual computation, 
it seems to say that the Normal Law somehow epitomizes that 
element of accidental distributions which is common to all 
sorts of circumstances. It is, in a sense, the Center of Per- 
spective of probability. 


§ 91. Empirical Families of Curves; Pearson’s Curves 


Among the families of curves which have been set up for 
the purpose of enabling the statistician to deal with a wide 
variety of data, the most empirical of all is that developed by 
Karl Pearson, for the foundation underlying it is only suggestive 
at best. This foundation consists of the observation that, in 
a certain approximate sense which will shortly appear, the 
Normal Law, the Binomial Law, the Poisson Law and the law 


ee 


7 


§ 91. PEARSON’S CURVES 2.4.5 


of repeated dependent trials given in (25) all satisfy the 
differential equation 


ey) sae a+. 
Pdx  b+ex+ dx? (117) 


for some set of values of the constants a, 4, c, d. Certain 
solutions of this equation are then sorted out, largely because 
of their algebraic simplicity, to form the desired family of 
distribution types. 

Take, for example, equation (23), which was represented 
graphically in Fig. 7. If we join the tops of consecutive 
ordinates by straight lines, as shown in Fig. 28, we produce a 
polygon, or sort of “ broken curve.” .Of course, the origin 
of the Binomial Law is such that 
the only values of 2 to which any 
significance can be attached are 
those which correspond to the 
“corners”’ of this “curve”; but 
we are going to ignore this fact for 
a moment, and think only of its 
appearance as a geometrical dia- pees: 
gram. Itis obvious that if we were 
to draw a smooth curve, not through the corners themselves, 
but through the mid-points of the connecting lines instead, and 
if we required this curve to be tangent to each connecting line 
at its mid-point, we would get something which possessed the 
principal characteristics of the polygon itself. We can actually 
carry out this process algebraically by noting that the abscissa 
and ordinate of the mid-point are, respectively, 


Ba 
tl 
S 
| 
NIH 


and 


Pesala) + Pale = 8); 
and that the slope of the connecting line is 


Pn(n) — Pn(n — 1). 


246 PROBABILITY AND ITS ENGINEERING USES 


From these values we easily find that at each mid-point the 
smooth curve of which we have spoken would satisfy the 
differential equation 


1 dP P,,(n) — Pp(n — 1) 


Pde PA ee 
which, upon replacing Pn(z) and Pm(2 — 1) by their algebraic 
expressions as given by (23), and changing the variable from 
nm tO ¥, reduces at once to 


1dP_ (R= mp =p)t+x 


Pa (- 2-22) 4 @ - be + ©? 
This equation is of exactly the form (117), the quantities 
which correspond to the constants a, 4, c and d being enclosed 
in parentheses. 4 

It is in this approximate sense that the Binomial Law may 
be said to obey the differential equation (117). 

It is interesting to note in passing that if p = 3, which 
makes the Binomial Law symmetrical, and if we write 
y= pm) /V (m + 1)p(1 — p) which is very much like 


the substitution used in (93), our equation becomes 


In the case of the law (25) also we are led to the differen- 
tial equation (117), provided we assume the supply of balls 
of each color (m and 7) and the total number drawn (p + 49; 
which we call r in what follows) to be specified, the variable 


1The constant of integration must be put equal to 1/ Nie otherwise the area 
under the curve would differ from unity. 


cor PEARSON’S CURVES 247 


being the number of one color obtained (that is, p). In this 
case, however, none of the constants is zero. They are, in fact 


_ _ 2mr +m + wn 
“3 (2 +mtn) ’ 
amr+mt+nt+t 
AEN oe ead 
aA ts 470 
“Sina en)? 
I 
2+m+n 

The demonstration of the fact that the Poisson Law also, 
in the same approximate sense, obeys the differential equation 
(117) is left as an exercise for the student. 

In a vague sort of way, therefore, (117) is characteristic of 
all four of these important types of distribution, and it is 
natural to expect that an empirical family whose members 
are bound together by this same law may possess advantages 
over one with even less by way of logical foundation. It was 
from considerations such as these that Pearson was led to 
examine all the solutions of (117) with a view to finding what 
sort of data they are capable of representing. 

The solutions in question, however, take different forms 
according as the quantity c? — 44d is positive, negative or 
zero. They are: if c? — 4 éd is positive, 


I-II. P = k(x — a)™(« — a2)™; (118) 
if c2 — ghd is negative, 


es 


Fae 


IV. RET Oa a ae ae (119) 
and if c2 — 4d is zero, f 
PP gangs, Gio) 
In addition there are degenerate forms: when d = 0, 
WGI P =k(e — a)™e™; (121) 


and when both ¢ and d are zero, 


fd ay a (122) 


248 PROBABILITY AND ITS ENGINEERING USES 


It need hardly be said that all the constants which appear 
in these equations, with the exception of the constant of 
integration k, are written in place of combinations of a, 4, ¢ and 
d, and are therefore perfectly arbitrary so long as a, 6, c andd 
are. As for the & in any equation, it must have such a value 
that the area under the curve defined by that equation is unity. 

Pearson chose the first of these as his “ Type II” curve in 
the special case when m: = m2; when mi ¥ mz he called it 
“Type I’; the next he called “Type IV ” in the special case 
when m =— 1; and as “ Type III” he used equation (121). 
He also listed several other cases, but these will be quite 
enough for our consideration. 

Let us now see what are the characteristics of the various 
Types: 

Taking first Type II, it is obvious that P either vanishes 
or becomes infinite at x = a and * =-a2: which of the two 
happens depends upon the sign of the exponent, but whichever 
happens at a: also happens at ae, for it must be remembered 
that in the case of Type II the exponents are equal. In no 
case, however, may the exponent be less than — 1, for if it 
were, the area under the curve would not be finite.} 


1Tt will be remembered that integrals in which the integrand becomes infinite 


— they are commonly called “ improper integrals ’’ — are defined as limits of integrals 
which do not contain the troublesome point. Thus 
oe * dx 
o «” 


is defined to mean 


Hence 
lim gi-™ — pl—m 


Oe BN, I—m 


. a é 
which is either or c, according asm <lorm> 1. 
re 


The same argument applies to the attempt to integrate (118): if either mm or mo 
is less than — 1, the area under the curve is infinite, no matter what value may be 
given to k, 


§ 91. PEARSON’S CURVES 249 


The same remarks apply to Type I also, except that it is no 
longer necessary that the curve behave in the same way at 
both end points. Instead it is quite possible for one exponent 
to be positive and the other negative, in which case the function 
vanishes at one end and becomes infinite at the other. More- 
over, the curves, whether of Type I or Type II, have quite 
different properties according as the m’s lie in the intervals 


—l<m,<0 O<m,<1 lcm, 
ey ~-I<m,<0 -l<m,<0 
a, a, a, a a, Qe 
—l<m,<0 O<m,<1 l<m, 
O0<m,<1 O<m,<1 O<m,<1 
a, a, ay, a, a, Qe 
—l<em,<9 O<m,<1 l<m, 
| lem, <M, l<m, 
‘OQ, a, a, Qa, ay a 


Fic. 29.—Typicat Forms or Pearson Curves. Types IJ anp II. 


Pim <0 O01 meas 1m. In the first case,, the 
curve becomes infinite at the end point; in the second it becomes 
zero, diving into the axis abruptly; in the last case it becomes 
zero, but meets the axis smoothly. As either of these condi- 
tions at one end can be combined with either at the other, 
nine typical results can be obtained, as shown in Fig. 29. 
Of course, the Type II curves are the special symmetrical 


cases of Type I. 
It is obvious that these curves cover a pretty wide range of 


2s0 PROBABILITY AND ITS ENGINEERING USES 


“ shapes.” They have, however, the common property of 
excluding values outside the range (c1, a2) completely. We 
still need curves which are limited at one end only, and others 
which are not limited at all. 

Type III supplies the first sort! Again we must dis- 
tinguish between values of m in the three ranges (— 1,0), 
(0, 1), (1, ©); and also between positive and negative values 
of 8. The various forms which this type may assume are 
shown in Fig. 30. 


—l<m <0 
>0 


Q<m” <1 
p>0 
a 
-l<em <0 O<m <1 lem 
p<9 p<9 
a a a 


Fic. 30.—Typicat Forms or Pearson Curves. Type III. 


Finally, the curves resulting from equation (119) extend 
to infinity in both directions. They can only define a finite 
area, however, provided m <— 4. Pearson choosesm =— 1, 


and writes the equation 2 


“2°? 
ik — 
ats B 
1 Pearson sets m =— Ba. This looks like a restriction on the value of m, but 


really is not; for any possible curve (121) can be made to coincide with one for which 
m =— Ba by simple translation. 


2 Again, (119) degenerates by simple translation into an equation without the a. 


§ 92. THE GRAM-CHARLIER SERIES 2% 


this value of m being particularly desirable because 


Lama 
al 


The two shapes which characterize this type are gotten 
when y and 6 have the same sign (78 > 0), and when they 
have not (y8<o). They are shown in Fig. 31. Neither of 
the two is symmetrical; but there is a degenerate special case 
when y = oOwhich is symmetrical. It, too, is shown in Fig. 31, 
in comparison with the Normal Curve (which is dotted). 


; v¥B>0 yb<0 


Fic. 31.—Typicat Forms or Pearson Curves. Type IV. 


Obviously, the Pearson Type IV gives a much higher chance 
of large deviations than does the Normal Law. 

That the Pearson Curves cover a wide range of possibilities 
is obvious at sight, and experience has shown that they can 
be made to fit a large proportion of statistical data surprisingly 
well. How they are used will be discussed in Chapter IX. 


We now turn our attention to other families of curves. 


§ 92. Empirica! Families of Curves; The Gram—Charlier Series 


We now come to the discussion of a totally different method 
of obtaining a distribution function to represent a set of 
experimental data — one which starts from a basis which is 
much sounder logically than is that underlying the Pearson 
Curves. 


252 PROBABILITY AND ITS ENGINEERING USES 


We start with the Normal Law (103), which for the moment 
we shall write in the form 


in order to conform to the notation which has come to be more 
or less standard in dealing with this subject. It is a simple 
matter to show that its successive derivatives are 


72 


— [I Ess 
etn 


¢'(y) ne ee 


eee 
¢"(y) = ome RC cea iG 
lr 


eae) 


¢ *. (3 7 3Y)s 


= 


6") = 


| 


or in general, 


i di ae = 
¢'(y) a AG), 


where H(y) is written briefly for the “Hermite Polynomial” 


Hid yy. = yy) = he Coy BIC yy ool eg BC yee 


These H’s and ¢’s possess the remarkable property that 
the integral of the product Hi(y) ¢’(y), taken from — © 
to + © is zero, no matter what the values of 7 and 7 may 
be, so long as they are not equal. This property, which is 
known in mathematics as “ biorthogonality,” is always a very 
valuable one; for it permits us to make use of a very simple 
method of expanding an arbitrary function, f(y), into a series 
of the form 


SY) = 60 OO) CL OC) “beta y) Fe e123) 


To show this, suppose we multiply both sides of this 
equation by H;,(y) and then integrate the result term by term 
between the limits — % and + co, Because of the fact 


§ 92. THE GRAM-CHARLIER SERIES 253 


that the functions are biorthogonal, every term on the right- 
hand side of the equation will vanish except the term for 
which the two indices are equal, thus giving us the relation 


J FOS) ay =f HG) #0) 4, 


or 


{ Hi) fO) 4y 
Ej = —— * 
{, Hy) ¢'(y) dy 


Now, by straightforward integration we find that the denomi- 
nator of this expression is equal to (— 1)*z!; wherefore 


c= SE HG) 0) 4. 


In other words, the coefficients in the series which represents 
S(y) are simply the product of known numerical factors by the 
integrals from minus infinity to plus infinity of the products 
of the Hermite polynomials into the function which is to be 
expanded. 


Of course all this is true only provided it is possible to 
expand f(y) into such a series; and of course it is only useful 
provided we are able to carry out the integrations. So it 
will be wise for us to give our attention for a moment to each 
of these questions. 

As for the first, it has been shown that any function f(y) 
which, together with its first two derivatives, is continuous, 
and all the derivatives of which vanish at infinity, is capable 
of representation by means of such a series. These conditions 
would seem to be broad enough to cover most statistical 
studies. 

As for the second, one condition is very obvious: we cannot. 
evaluate the integrals in question unless we know what the 
function f(y) is, and in the case of statistical studies we almost 


2s4 PROBABILITY AND ITS ENGINEERING USES 


never do. What we usually know is, that certain observations 
have given us certain results. They admittedly do not repre- 
sent the function exactly; but they are the best we have, and 
the problem before us is that of making the very best possible 
use of them. In the attempt to do this, we shall be much 
better pleased with a result that does not obviously disagree 
with the data, than we will with another result that still is 
grossly in error even after we have done our best with it, 
no matter how sound the latter may be theoretically. 

We can perhaps illustrate this idea a bit better by reference 
to a more familiar type of expansion. It is well known that 
the function e-” can be represented for every value of x by 
means of a Taylor’s Series. If, however, we possessed data 
which obeyed this law, though we did not know it did, and 
if our data were only extensive enough to permit the determina- 
tion of three coefficients in our Taylor’s Series, we would be 
hard put to it to find a series which possessed even the 
major characteristics of the function in question. Certainly 
no three-term polynomial would do. But we might get some- 
thing of practical utility from the use of the function 
acot—1(6 + ¢ x?) which, theoretically, is not the right thing 
to begin with at all. 

The same general situation exists with reference to the 
Gram-Charlier Series. So far as their application to statis- 
tical data is concerned, fine-spun theories regarding conver- 
gence and the like are likely to be a work of supererogation; 
for it is usually possible to determine no more than three or 
four coefficients at the most, and the practical question is 
simply how far the simple expressions thus derived are capable 
of representing our data. 

But if the series have no substantial theoretical foundation, 
so far as their use in representing empirical data is concerned, 
they do have very substantial merits of another sort. For 
one thing, it has actually been found by experience that even 
the few terms with which we may conveniently deal are 
capable of representing many sorts of data in a quite satis- 
factory fashion; for another, they are easy to compute because, 


§ 93. THE GRAM-CHARLIER SERIES 255 


as we have already said in our discussion of the Normal Law, 
extensive tables of the functions ¢'(y) are available to assist 
us.1_ The Pearson types, on the contrary, are sometimes quite 
unwieldy. 

How the coefficients are to be determined when the series 
are used in this empirical fashion will appear in Chapter IX. 
For the moment we return to the realm in which the theoretical 
foundation of the series does apply —the realm where the 
true law is known in advance. 


§ 93. Gram-Charlier Approximations to the Binomial and 
Poisson Laws 


In § 82, in an attempt to derive a formula which would approx- 
imate the Binomial Law, we arrived at the formidable series (102), 
the computation of which in any individual case is likely to be 
tedious. But by comparing the coefficients in this series with the 
known expressions for the Hermite polynomials? it is a simple 


1 We should also mention the fact that the Gram-Charlier Series has been deduced 
from certain assumptions regarding “‘elementary errors” analogous to those which 
lead to the Normal Law, and therefore possesses whatever merit such an argument 
implies. 

I do not believe it establishes the logical preeminence of the Gram Series as a 
medium for the representation of distributions of data; but, as Dr. W. A. Shewhart 
has pointed out, if the fluctuations in a variable can be represented accurately by 
one or two terms of a Gram Series, the causes underlying those fluctuations must be 
of the same effect as if they were actually “elementary errors.” When this situation 
exists, no amount of data is likely to teach us much about the individual causes of 
variation; when it does mot we may hope to learn something about those causes, 
and, if the variation be an undesirable one, to eliminate them. 


2 We have already given in § 92 a general formula for the Hermite polynomials. 
We add a list of the first thirteen: 


Ho = 1, H; = y'— 10 y8 + 15 y, 

f= ys Ha = yo = 15 y4 + 45? — 15, 

Hy = y?—1, TAGES YES OE YE TOS" AICS, 

Hs; = y? — 39, H, = y3 — 28 y® + 210 y4 — 420'y? +- 108, 
Hs = y4— 6y? + 3; Hy = y® — 36 y7 + 378 y5 — 1260 9° + 945 y, 


Hig = y!° — 45 y® + 630,98 — 3150 y* + 4725 9” — 945, 
Hi = y™ — 559° + 9909" — 6930 9° + 17325 y* — 10395 J 
Hu = 2 P06 yi 123485 y7 = EGBGO yo 51075 YO — 023709" 10395. 


256 PROBABILITY AND ITS ENGINEERING USES 


matter to throw it into the form 


“gg BD peg Ee OP: a OP id ae oo ape ee *) 
EMO ee ee +5 ( 5 oF 72 co 
4207 (PPT ge 4 PE 
o 120 144 
$A att ge) .. (124) 
1296 


This series, which is of the Gram-Charlier type, is much easier 
to evaluate than (102) because 
of the aid that may be gotten 
from tables such as those in 
Appendix IV. It is particu- 
larly useful when we desire to 
know the probability of x 
lying between two preassigned 
limits 71 and m2. To see why, 
we refer back again to Fig. 
25, in which the circles repre- 
sent the actual values of 
P»(n), while the smooth curve 
which passes through them 
gives the approximation (102), 
or what amounts to the same 
Fig. 32. thing, the Gram-— Charlier 
Series (124). We reproduce 
part of this figure as Fig. 32, except that the true values of Prin), 
instead of being plotted as points, are represented by rectangles as 
in Fig. 8. 

It is obvious that the area under the smooth curve between 
the ordinates at 7 and 72 is smaller than the area of the correspond- 
ing set of rectangles,' for the rectangles extend a half unit beyond 
these ordinates in each direction. On the other hand, the area under 
the smooth curve between the limits 7; — 3 and m2 + 3 is an excel- 
lent approximation to the area of the rectangles, since the little tri- 
angular areas a1, a2; b1, b2;..., pretty nearly compensate for one 
another. a 

We can, therefore, find a pretty good approximation to the 


*If the meaning of this is not clear, the reader is refered to the last three para- 


graphs of § 38, 


§ 93. THE GRAM-CHARLIER SERIES 257 


probability of 2 lying between m and m2 by taking the integral of 
(124), using as limits of integration, not the values y1 and y2 that 
correspond to 7; and 72, but others which correspond to my, — 4 
and m2 -+ 3 instead. From the general relation (101) which exists 
between y and 7 we easily find these limits, and write down the 
formula 


1 
y+ 


YS Pala) = Pnly)dy. (125) 
n=ny a 
Yoana 


We have chosen to speak of Pm(y) as defined by the series (124), 
but the validity of our argument would be just exactly the same if 
we were to use some other definition. For instance, we might use the 
series (102) which is, except for the form in which it is expressed, 
identical with (124). But though such a substitution does not 
impair the validity of the argument, it does affect the ease with 
which computations can be carried out. 

To illustrate this, we may note that if the series (102) were 
integrated it would give another series likewise containing various 
powers of y. To evaluate (125), therefore, we should have to 
compute these powers for both limits of integration and form the 
appropriate sums. On the other hand, the fact that the various 
¢’s in (124) are successive derivatives of the Normal Law makes it 
only necessary to reduce the indices by 1 in order to have their 
integrals, the integral of $’’(y) being $’’(y), the integral of $"(y) 
being ¢’”’(y), and so on. Now all these quantities may be taken 
directly from the tables contained in Appendix V, even the integral 
of ¢(y) being given under the heading ¢-1(y). It therefore becomes 
an exceedingly simple matter to carry out the computation of (125). 

As a numerical example we might consider the problem of finding 
the chance of 7 successes in 100 trials of an event the probability of 
which is 0.1. For this problem we have p = 0.1, m = 100, o = 3, 
whence (124) becomes 


i ee es RCS ee 
Po es 45 eee (3 or 2,025 ) 
I 2 23 vt 4 *) 
Ee Baas te gf (526) 
i Caer "343,000" ~ 273,375 


Table XVII illustrates the degree of accuracy which may be 
expected of this formula; but the form of the table requires some 


explan ation. 


258 PROBABILITY AND ITS ENGINEERING USES 


TABLE XVII 


100-—n 


100 n 
Tue Binomiat Law Cy (0.1) (0.9) AND SEVERAL GRAM-CHARLIER 


APPROXIMATIONS TO IT 


n True Value Ao Ai Ag A3 
fe) ©.00003 -+o.00048 —0.00013 —o.00006 —0,00002 
I 0.00030 + 118 _ fe) — 5 _ 2 
2 ©.00162 + 218 a 32 a 6 + I 
a 0.00589 ae 285 = (ie Woe 18 + 5 
4 0.01587 + 212 a 52 a 14 + 3 
5 0.03387 rae fae Te pk ee 3 
6 ©.05958 = 491 = 95 = oy) — 5 
G) 0.08890 = 824 ae 107 = 14 = I 
8 0.11482 = 834 = 31 4 oe I 
9 0.13042 a 462 oi 76 12 aE I 
sf) 0.13187 ae III Str iii = I c= I 
II 0-11988 ++ 592 ae 53 = 14 =P I 
12 0.09879 gies pane ST t= cK fees) ts oR a 2 
13 0.07430 eae ed5 a 83, hase 12, | = I 
14 ©.05130 + 370 — 59 a. 14 = 3 
15 0.03268 ap 48 = 7 + 2 — I 
16 0.01929 = 130 a 30 — 8 + 2 
17 ©.01059 = 185 ae 37 — 9 + 3 
18 0.00543 = 163 oe a2 = 4 nF I 
19 0.00260 = 113 ae 6 ai ie iS = I 
20 0.00117 — 66 = 4 + 3 = I 
21 ©.00050 = 34 = 6 ale 2 = ° 
22 ©.00020 —0.00015 —0.00005' | -+0.00001 +0 .000Cc0 


In the first place, (126) gives the values of Pioo(y), not Pio0(7). 
To get the latter it is necessary to introduce the proper Jacobian, 
which we know to be 1/o = 4. Hence the values from which the 
table is derived are not the numbers given by (126), but one-third 
of them instead. 

In the second place, though we have given in the second column 
the true values of the Binomial Law, the Gram—Charlier approxima- 
tions to it are indicated, not by their actual values, but by their 
errors; for it is the errors in which we are principally interested. 

In the third place, the four columns of deviations correspond: 


§ 93. THE GRAM-CHARLIER SERIES 259 


(1) to the Normal Law, which is represented by the first term of 
of (144); (2) to the second approximation obtained by including 
the term which has the first power of o in its denominator; (3) the 
approximation obtained by including the second power of o also, 
and (4) the complete expression, so far as we have written it. That 
the convergence of the series is a fairly rapid one is obvious from 
the rapid vanishing of the A’s.! 


To these remarks we may also add the following observations, which will probably 
be of less interest to the elementary student than to those already familiar with the 
subject. 

In the first place, various writers, of whom Edgeworth is perhaps the best known, 
have pointed out that to get the “‘ best” results from a Gram-Charlier Series the 
terms should be associated in an order different from the “ natural” one. Specifically, 
the term of zerc order must either be used alone, or else in some one of the combinations 


Os 35 
OR Big ek ee 
O, ERs 4, 6, Sy DD 9- 


Now this is just exactly the way the terms have grouped themselves in (124) and 
(126); and we see at once that the rule may be said to be a natural consequence (in 
the case of Binomial expansions at least) of the attempt to arrange the terms in 
descending powers of o. 

In the second place, there is a common rule to the effect that the coefficient of 
the term of order 6 is approximately equal to half the square of the coefficient of 
order 3. In the case of (124) and (126) this is identically true, if by the “term of 
order 6” we mean that one which occurs in combination with ¢”. This, however, is 
not the entire term of order 6: the first unwritten term of (124) would also contain 
¢”, but as it has a higher power of o in its denominator it can be expected to be of 
little importance by comparison with the part accounted for by the common rule. 

It would appear, therefore, that to obtain the best results from the use of the 
Gram-Charlier Series, some other order than the natural one is required. Whether 
that suggested by Edgeworth is the “ best in the long run” will depend largely upon 
what we mean by that phrase. Certainly it is not to be expected that any order of 
summation can ever be devised which will not, in an exceptional case, be less exact 
than some other order which happened to be peculiarly appropriate to that exceptional 
case. 

As an illustration of the fact that the grouping of the terms really has a beneficial 
effect at times, we may compare our approximations with that obtained by Mr. Arne 
Fisher in his Mathematical Theory of Probabilities, p. 268. The approximation 
there given was obtained from the use of terms as high as $", in the natural order, 
and therefore occupies an intermediate position between our A; and Ag columns; 
for our grouping of terms is such that we either omit the function of the fourth order 
entirely, or else include the sixth also. We can therefore best make our comparison 
with these two columns. a! yet 

Taking the root mean square of the deviations as our criterion, Mr. Fisher’s approx- 


260 PROBABILITY AND ITS ENGINEERING USES 


We may also make use of the same illustration as an example 
of the type of accuracy that can be attained when the integrals of 
such series are used as approximate expressions for the sum of the 
Binomial Series itself. Table XVIII contains, in the second column, 
the results of summing the Binomial Series from 7 to 100: that is, 
it contains the probability that 100 trials would result in no less than 
my successes. The remaining columns contain the results of inte- 


grating (126) from the lower limit that corresponds to m — 3 


that is, from y; — —} to infinity. The approximation is not quite 
20 


so good as in Table XVII, but it would still be quite sufficient for 
many purposes. 


imation is in error by 8 in the fourth decimal place (his values are listed to only four 
places), while our second approximation is in error by 5, and our third by 1, in the 
same place. In other words, a suitable grouping of the terms has given noticeably 
greater accuracy to our second approximation than was obtained from Mr. Fisher’s 
use of the natural order which included one term more. 

Mr. Fisher is, of course, well aware of the benefit to be derived from proper grouping. 
In fact, in a letter to the editor of the Bell System Technical Journal (Vol. 6, 1927, 
pp. 172-180), he has pointed out that there exists a definite relationship between 
the order of magnitude of the so-called ‘“‘elementary errors” and the arrangement 
of the terms in the Gram-Charlier Series. Mr. Fisher informs me that the first 7 
approximations are contained in the following combinations, originally given by 
Charlier and J¢rgensen: 


oO; 

O; 33 

©; 35 4 6; 

©, 3, 4, 6 5, 7. 95 

Oy Oh Ga V8 Op, th Oy TOR 

Oh Gy 25 OS Gy GB Gp. TOh Lassi, 1G), BegP 

©5135, Os aby 7s 9 8510) 125 Ll 13, Usha, Om Se 
These differ from the groupings to which we have been led only in the matter of 
including the entire term of any order in the place where that order first appears, 
whereas our scheme leads us to split such terms up into various parts according to 


the relative powers of « which they contain. For example, in our seventh approxima- 
tion the order of terms would be 


©; 33 45 6; 55 7> 93 6,8, 10, 125 7> 9. II, 13, 15; 8, 10, 12, 14, 16, 18; 
the semicolons being used to mark off groups which have the same power ofc. The 
law of formation of either set is obvious. 
In his letter to the Bell System Technical Fournal Mr. Fisher makes use of the 
third of the above relationships and adds a term of order 6 to the values given in 


his Mathematical Theory of Probabilities. The result thus obtained is identical with 
the Az of Table XVII. . 


§ 94. TRANSFORMATION OF VARIABLE 261 


We can carry out a similar line of argument with respect to the 
series (116), which was obtained as an approximation to the Poisson 
Law. It is found without difficulty to reduce to the Gram—Charlier 
form 


mt tv vt v ‘ ott tz 
ee pore lc (OP OT 
Sais a 6 < gq (< % 70 a? \120 i 144 = 1,296 
vt ttit zt zit 
ns (2 Wah a Ot 9.! od ) 
F720 6,700) 1-928. 37.104 


te) 


from which either the probability of taking a particular value or 
the chance of it lying within a specified range could be computed. 
The Poisson Law is of sufficient importance, however, that extensive 


TABLE XVIII 


CERTAIN GRAM-CHARLIER APPROXIMATIONS TO THE SuM oF A BINOMIAL SERIES 


m1 True Value Ao Ay As A3 

fe) I .00000 —0.00023 -+0.00020 +0.00002 —0.00001 
5 0.97629 MOOT Sm 2 REO KG be ag Oh 81 
10 ©. 54871 S747 te 47 = = 6 
15 0.07257 FeO ee lag at Be ae 80 
20 ©.00198 —0.00121 —0.00015 -+-O.O0011 -+0 .00009 


tables have been prepared, not only of P(z) = e*e~‘/n! itself, but 
also of the function II(7) = 2: P(x) which represents the probability 


that 7 is not less than a specified value. Appendices VI and VII 
are skeleton tables of this sort. Much more elaborate ones can be 
obtained, two of which are referred to in the References for Outside 
Reading. Obviously, with such tables available, we have very little 
use for a series expansion. 


§ 94. Empirical Families of Curves; Transformation of Variable 


Let us think for the moment of any two distribution curves, 
such as those shown in Fig. 33. They are decidedly dissimilar, 
but being distribution functions they necessarily enclose equal 
areas. Suppose, now, that we start from the left and lay off 
on each an equal area 4, as, for example, the shaded areas 


in the figures. Finally, suppose we call the y and « which 


262 


PROBABILITY AND ITS ENGINEERING USES 


bound these areas “ corresponding” values. Such a process 
could obviously be carried out for any area 4 (or for any x), 
thus relating a y toevery x. Let us imagine the corresponding 


Fic. 33. 


pairs to be plotted as in Fig. 34. By this means we have 
found a function of x the distribution of which is exactly in 
accordance with the right-hand curve of Fig. 33. 


In other words, ”o matter what the distribution of x may be, 


there is some function y(x) of 
such a nature that the distribution 
of y will conform to any law we 
may desire. For example, we 
might cause it to be repre- 
sented by a straight line, so 
that equal ranges of y were 
equally probable. 

Little has ever been done in 
the matter of finding to what 
extent this form of transforma- 
tion might be utilized in deriv- 
ing distribution laws for em- 
pirical data,! but it has been 


used now and then to throw unusual types of distribution 


curves into forms that conformed better to the established 
empirical types. 
In a way this too is a means of obtaining a family of curves, 


‘It would seem that there might be some point to the attempt to treat statistical 
data by finding what transformation would result in a completely random distribution: 
that is, in equal probabilities for all intervals within a certain range, 


§ 94. PROBLEMS 263 


but in this instance the family includes almost anything con- 
ceivable. We have, infact, progressed from a few types, 
derived from a given differential equation and containing only 
a limited number of parameters, through a somewhat more 
general set with more parameters and therefore theoretically 
capable of finer gradations, to a method which, in its broad 
outlines at least, is capable of yielding anything at all which 
can conceivably be a distribution curve. But this progression 
to greater and greater flexibility has been accompanied by a 
loss of definiteness of type which in practice seems to largely 
nullify any value it might otherwise possess. After all, 
statistical data is not exact. It is merely representative, and 
about all that we can hope to do with it—if we may be 
allowed for the sake of emphasis to use an expression which 
smacks somewhat of exaggeration —is to find the type to 
which it belongs. 


PROBLEMS 
1. Show that Poisson’s Law leads to the differential equation (117). 
2. Find the value of & in (118). 
3. Find the first two expectations of (¥ — a) in (118). 
4. Find a general formula for the 7th expectation of (v—a)in (121). 


s. In the case of (121) let 6 be the deviation of « from its expecta- 
tion, measured in terms of its standard deviation. Write the equa- 
tion of P(6). 


REFERENCES FOR OuTsIDE READING 


I. On the Normal Law: 
1. Coo.ipce: Probability, pp. 101-119. 
2. Livy: Calcul des Probabilités, Part I, Chapter IV, and 
Part II, Chapters IV and V. 
3. CzuBER: Wahrscheinlichkettsrechnung, pp. 287-299. 


II. Pearson’s Curves: 
4. Pearson: Mathematical Contributions to the Theory of 
Evolution, Philosophical Transactions, A, Vol. 186 
(1895), pp. 343-4143 Vol. 197 (1901), Pp. 443-4595 
Vol. 216 (1916), pp. 429-457. 


264 PROBABILITY AND ITS ENGINEERING USES 


III. Gram—Charlier Series: 


5. ARNE FisHer: The Mathematical Theory of Probabilities, 
Chapters XIII and XIV. 


IV. General: 


6. Rrerz: Report on Statistics, Bulletin otf American 
Mathematical Society, Vol. 3° (1924), Pp. 416-4535 
Mathematical Statistics (No. 3 in the series of Carus 
Mathematical Monographs). 


V. Tables: 
7. Pearson: Tables for Biometricians and Statisticians, Cam- 
bridge University Press. 


8. JORGENSEN: Mathematical-Statistical Tables, D. Van 
Nostrand, New York. 


CHAPTER IX 
Curve FIttinc 


§95. The Primary Problem 


We have said that many probabilities cannot be found by 
the application of the definition of a probability, and that in 
such cases experiment is our best guide. We have also seen, 
in our discussion of Bernoulli’s Theorem, that the most we can 
conclude from experiment is that the results are un/ikely to differ 
much from the true probability. Naturally, we are interested 
to know how unlikely and how much. The two things are of 
course very intimately related — there is a certain probability 
of deviating from the truth by any mentionable amount — and 
thus the problem that presents itself is to find the relation 
between these two. 

When we can get the necessary information, Bayes’ Theorem 
tells us how sure we may be of the estimate to which an experi- 
ment has led us. But it requires that we know how likely 
that same estimate was before the experiment was performed, 
and we do not usually possess that information. Hence 
Bayes’ Theorem is seldom capable of assisting us. What we 
really need is some method of judging the accuracy of our 
estimate without knowing what its inherent probability was 
before the experiment was performed. The present chapter, 
in the main, is devoted to methods which have been developed 
for this purpose. 

From the very start, however, we must keep in mind this 
thing: It is a merit of Bayes’ Theorem, not a weakness, that 
it takes account of the inherent likelihood of an event. It 
would be well to read again § 52 and note that the information 
sought does not depend solely on the experimental evidence, 

265 


266 PROBABILITY AND ITS ENGINEERING USES 


but on all the other bits of knowledge which go into the making 
up of our advance judgment as to the goodness of our coin. 


It is a matter of cold fact that the reliance which we dare 
place in the result of an experiment depends upon the inherent 
plausibility of the result as well as upon the accuracy with which 
it has been attained. Any substitute for Bayes’ Theorem which 
does not require the knowledge of the a priori probabilities must 
therefore be incapable of giving the a posteriori probadilities. 


Being, therefore, unable to get a mathematical measure of 
the assurance with which we may accept our estimate because 
we do not first possess a mathematical measure of its inherent 
plausibility, we turn to the task of finding the best possible 
makeshift. Fortunately the makeshift is not a very unsatis- 
factory one when its meaning is clearly understood, but we 
must repeat that it does not tell us how probable our estimate is. 
Nothing can tell us that which does not require the a priori 
probabilities. 

In entering upon this subject we are embarking upon the 
study of statistical methods. We cannot cover that study 
completely, for the mere description and illustration of the 
processes of computation alone would occupy an entire book, 
and an adequate treatment of the algebraic discussions upon 
which those methods are founded would occupy several. Our 
purpose must be, first, to give a correct picture of what we may 
expect to learn from such statistical studies; second, to direct 
the student to the sources where the detail can best be acquired 
when needed; and third, to explain the more important tech- 
nical terms employed, so that he will not find those sources 
unintelligible. 


§ 96. The Hypothetical “‘ Universe of Data,” or “ Population,” 
and the “ Sample” 


When an “ experiment” is performed we really sort out one 
of a group of possibilities. Thus, when a penny turns up 
“heads”? we sort out one of two possibilities; or when we 


§ 96. THE UNIVERSE AND THE SAMPLE 267 


measure the resistance of an electrical device we sort out one 
of an infinity of possibilities. I do not mean to call attention 
to the fact that the device might really have most any resist- 
ance, and that we are determining what it really is. I mean 
that, due to the uncertainties of measurement, we might have 
gotten some other answer: in other words, that our mistake in 
measurement is only one of a number of possible mistakes. 
Theoretically we may take the number of possibilities to be 
infinite, though ordinarily, due to the fact that our imstru- 
ments have a “least count,” this will not be true. 

When a punch press turns out a stamping, its size is one 
of a number of sizes that might have appeared. 

When a man dies, his age is one of those to which he might 
have lived. 

Each of these is, in a way, an “ experiment.”’ But experi- 
ments need not be so simple. The day’s output of a punch 
press is — so far as dimensions are concerned — an aggregate 
of data: a table of numbers. This table, however, is only one 
of a number of tables that might have been obtained. It, too, 
is an “experiment” that could have resulted otherwise. 

The number of heads and tails that come from 7 throws of 
a penny is only one of a number of possible results. And 
so on. 

Thinking along these lines leads us quite naturally to the 
conception of a vast storehouse of possibilities from which, by 
our experiment, we have drawn one blindly. This hypothetical 
storehouse we may call the “universe’’ (or “ population ’’) 
from which our “sample” was drawn. Naturally, we must 
imagine it to contain all possible results in proportions equal 
to their respective probabilities, which will ordinarily require 
us to think of an infinity of items in the storehouse (or, “ indi- 
viduals” in the “ population’). Moreover, as we have said, 
the individual item may be quite complicated: it may be a 
sort of bundle of things. Thus each individual in the popula- 
tion may be “7 heads and 12 — 7 tails,” if we are thinking of 
what can happen in twelve throws of a penny. 

These words “ universe’ or “ population,” “ individual,” 


268 PROBABILITY AND ITS ENGINEERING USES 


and “‘ sample” are in constant use in statistical literature, and 
they are really quite valuable, for they enable us to talk quite 
abstractly about events of such diverse sorts that no sufficiently 
general word could otherwise be found for them. We have 
introduced them as relating to a hypothetical storehouse the 
properties of which are determined by the probabilities, known 
or unknown, which govern our experiment. It is only fair 
to say that they did not so originate. They are the product 
of a statistical school which, if it does not actually deny that a 
probability exists before an experiment is performed, at least 
regards the experiment as the only feasible means of arriving 
at it. If I interpret them correctly, the adherents of this 
school attribute to a “ population ”’ a degree of logical reality 
which transcends that of abstract “ probability.” To them, 
the “population” or “universe” is the actuality, while 
“probability ” is relegated to a shorthand form of statement 
for “the relative frequency of like individuals in the popu- 
lation.” 

Though I appreciate the practical usefulness of the “ uni- 
verse’ concept in aiding imagination, I do not see that the 
logic of a subject is in any way bettered by substituting for an 
abstraction (probability) that sort of quasi-concreteness which 
dreams possess: that is, the mental image of a “ universe” 
which would be concrete enough if it existed, but which never 
does, and frequently could not, exist. 


¢ 


$97. The Accepted Criterion as to Goodness of Fit 


We may now state the form of solution which is adopted 
as a standard substitute for Bayes’ Theorem. It consists of 


(a) Making use of our experimental data to form a more 
or less plausible estimate of the probabilities which the experi- 
ment was designed to measure; and 


(2) Computing the chan'ce that another experiment, so con- 
ducted that its probabilities were really equal to these estimates, 
would lead to a result that is at least as improbable as the one 
under discussion, 


§ 97. GOODNESS OF FIT 269 


Or we may express the matter in terms which are more com- 
monly used by statisticians as follows: 
(2) We consider the results of our experiment and from 


them estimate the proportions in which the different kinds of 
individuals are contained in the population; and 


(2) We compute the chance that another sample, taken 
from this assumed population, would contain the various kinds 
of individuals in proportions that are no more probable than 
those in the experimental sample under discussion. 


A simple illustration will help in getting these ideas fixed. 
Suppose we were attempting to measure the probability of a 
certain event occurring, and had conducted an experiment 
which gave 20 successes in 50 trials. Suppose, moreover, 
that because of certain collateral evidence, which we need not 
discuss, we reached the conclusion that a fair estimate of the 
probability was $. From Table X we find that if this estimate 
is correct the chance of getting 20 successes in $0 trials is only 
0.0704. The particular result in question is therefore not 
very likely. But by glancing over the table we quickly 
observe that no other result is much more likely, and many 

“of them are even less likely. Obviously under these circum- 

stances it is unfair to conclude that our estimate is an unreason- 
able one. The question that really presents itself is, how 
likely would we be to get a result the probability of which is 
no greater than 0.0704. We can easily answer this question 
by adding together the probabilities of all possible results 
except those from 14 to Ig, thus getting the answer 0.368. 
Obviously, then, the chance of getting a result that was at 
least as unlikely as 20 successes is big enough that we need 
not discard our assumed probability as an untenable one. 

Or, to take another simple example, suppose we have dealt 
a pack of cards and have observed the order in which they fell. 
Suppose by @ priori reasoning we have formed the estimate 
that all orders are equally likely. On the basis of this estimate 
we compute the chance of getting the observed result and find 
it to be 1/52!, which is inconceivably small. Are we therefore 
justified in discarding our hypothesis? Certainly not, for any 


270 PROBABILITY AND ITS ENGINEERING USES 


other possible order would be just as improbable. The thing 
that counts is, that on the basis of our hypothesis we were sure 
to get as unlikely a result as this. 


The accepted form of solution tells us, then, not how likely 
it is that our estimates are correct, nor how likely our expert- 
mental result was if the estimates are correct, but instead it tells 
us the chance of getting a result at least as unusual as the one 
we obtained, if the estimates are correct. 


The first step in our process — that of forming an estimate 
of our probabilities —is likely to present itself in a wide 
variety of forms. Sometimes it requires the estimation of 
only one probability, as in the first example above; sometimes 
it requires the estimation of many, as in the card illustration 
where we really estimated 52! probabilities at once, making 
them all equal; sometimes the number may be infinite, as in 
finding the chance of a metal stamping differing from standard 
by a specified amount. The last case, indeed, requires that 
we give the equation of a distribution curve, rather than the 
values of certain discrete numbers. 

‘Naturally the methods of forming our estimates of these 
probabilities are going to vary widely, not only with the 
number of variables with which we must deal, but also with 
the amount of collateral information which we may possess and 
which influences our choice. For this part of our work the 
various distribution functions discussed in Chapter VIII will © 
be our principal stock of tools. ; 

Once our estimate has been formed, however, the other half 
of the process follows the same set of ideas pretty consistently, 
no matter whether we may be dealing with a single prob- 
ability or with an entire distribution curve. It will be our 
purpose, so far as possible, to cause this underlying unity to 
show through the confusion of algebraic detail which we cannot 
entirely avoid. That purpose can probably best be accom- 
plished by means of a set of examples arranged in increasing 
order of complexity. 


§ 98. *. WELDON’S:. DICE DATA a71 


$98. Some Instructive Illustrations; The Biased Die 


EXAMPLE 49.—There is a case on record where a die was thrown 
315,672 times, with the result that either 5 or 6 appeared 106,602 times. 
Was the die true, and, if not, what was the probability of the appearance 
of 5 or 6? 


If the die was a true die the expectation was just one-third 
the number of throws, or 105,224. The difference is not 
very great — about 14 per cent — certainly not great enough 
to convince us offhand that the die was bad. We therefore 
begin with the assumption that it was good, and attempt to 
find out whether that assumption is plausible. To this end 
we shall apply our criterion of goodness of fit, and shall com- 
pute the probability than an experiment conducted with a true 
die would give a result that is at least as unlikely as 106,602 
s’s or 6’s. 

This is not very difficult. In fact, the case is one to 
which the Binomial Law applies directly, the number of 
trials being 315,672 and the (assumed) probability 4. But 
with so large a number of trials the Binomial Law is known 
to be very accurately approximated by the Normal Law, 
provided we choose our unit of measure properly. That 1s, 
we must first find the deviation of our result from expectation, 
which is obviously 1,378, and then we must express it in terms 
of the standard deviation as a unit. We have seen in § 82, 
however, that the standard deviation of the Binomial Law 
is V impli — p), which in this case is found to be 264.9. 
Measured in terms of this unit the deviation from expectation 
is y = 5.20. Obviously, larger deviations than this are less 
probable and smaller ones more probable; so our problem 
becomes that of finding the chance of a deviation at least as 
big as §.20. 

This is identically the same problem as was met in § 93, 
except that in the present instance no very high degree of 
accuracy is needed, so that the first term of the series (124), 
which is just the Normal Law, is quite sufficient. Moreover 
we can quite properly ignore the small correction term 


272 PROBABILITY AND ITS ENGINEERING USES 


1/2V mp(1 —p) that appears in the limits of the integral 
(125). Hence, with accuracy enough for our present pur- 
poses, we may say that the chance of a deviation larger than 
5220216 


9 


d { e? dy = o-,(0) — $-4(5.20), 
5.20 


V 27 
where $_,(y), as in § 93, refers to the function tabulated in 
Appendix V. 


We have thus taken care of deviations which are positive 
and greater than 5.20. But to the extent to which the Normal 
Law is an approximation to the Binomial there are also negative 
deviations that have just the.same probabilities. Hence the 
total chance of getting a result less likely than that stated in 
the example is just twice what we have written above. 

We could, of course, compute this result by the use of 
Appendix V._ However, the integral 


2) == fet ay (127) 


is of so frequent occurrence that it is worthy of a table of its 
own, which is given in Appendix IV; and from this table we 
find at once that the probability in question is ®(5.20), which 
is obviously less than 0.0001. From a larger table its value 
could be found to be 0.000,000,2. Hence the chance of a 
true die giving as improbable a result as did our. experiment 
is only about one in five million. It is therefore quite likely 
that the die is asymmetrical unless there is a very powerful 
a priori reason for thinking otherwise. As we have no such 
powerful prejudice, we accept our first estimate of p to be 
unlikely, and seek for a better one. . 

Suppose we adopt, as our second estimate of the probability 
in question, the proportion of times a 5 or 6 was observed, for 
we know from Bernoulli’s Theorem that this is not likely 
to differ much from the true value of p. This gives us 
P = 0.337;699. But upon this assumption we find that y = o. 


Naturally any experiment which we might perform would | 


§ 98. WELDON’S DICE DATA 273 


show a deviation as large as this; or in other words ®( y) is, 
for this case, unity. But-surely this does not mean that our 
new estimate is absolutely accurate. Instead it merely reflects 
the fact that we have artificially forced an agreement between 
the estimate and the experiment by computing the one from 
the other. 

Later on, in dealing with more complicated examples, we 
shall find that we must always compensate for any such forcing 
and it will be well for the student to keep in mind until then the 
present illustration, in which the absurdity of accepting the 
uncompensated $(y) is quite apparent. The very simplicity 


TABLE XIX 


es Standard 
Assumed ? Number, of Deviation ie y ®(y) 
s’s or 6’s 
aan 105,119 +1483 264.9 +5.699 0.0000 
©. 335 105,750 + 852 265.2 ap aeate) 0.0073 
©. 337 106,382 a= aK) 265.6 +0. 828 0.4076 
©. 339 107,013 = 4II 266.0 Sit Shils O.1224 
0. 341 107,645 —1043 266.4 — 3.915 ©.0001 


of our example, however, makes the present an unsuitable time 
to discuss the matter. Instead we shall circumvent the dif- 
ficulty by another method of approach. 

We wish to know how far we can trust this experimental 
approximation to the probability. Surely we can place no 
confidence in the last g of the 0.337,699, for if we had tossed the 
die just once more we would of necessity have had a frequency 
_ ratio of either 0.337,696 or 0.337,702. On the other hand, to 
write merely 0.33 would be unduly conservative, for we have 
seen that the departure from 0.333,333 1s almost certainly 
significant, and that departure occurs only in the third place. 

We answer this question by assuming various probabilities 
to be the true one — just as we have already assumed first 
2 and then 0.337,699 to be the true one — and finding whether 


274 PROBABILITY AND ITS ENGINEERING USES 


the deviations of the experimental results from these assumed 
probabilities can be regarded as significant. We list the com- 
putations involved in Table XIX. 

It is obvious that the results which we have obtained were 
to be expected if the probability is 0.337 or 0.339, but that 
they would be quite unusual for p = 0.335, and exceedingly so 
for the other values. Hence, unless there is a powerful reason 
for doubting it, we are forced to conclude that p is very prob- 
ably between 0.335 and 0.340. 

This is one way of answering our question, and, indeed, 
virtually the only one open to us; but we have made the 
computations unduly laborious, as we shall readily see by 
making two observations about the above figures. In the 
first place, in looking up the probabilities in Appendix 1V we 
have made no use of the sign of y. Therefore y’s of equal 
magnitude but opposite sign are equally likely — or we should 
rather say, the deviations to which they correspond are equally 
likely. Why, then, can we not choose that & which we shall 
accept as marking the boundary between admissible and 
inadmissible p’s, look up the corresponding y, and find from it, 
first the deviations, and then the p’s, which we are to regard 
as limiting our range? That is, why can we not work dack- 
wards from ® to the corresponding p? 

The hypercritical answer is, that to get the deviations from 
the y we should have to multiply y by certain standard devia- 
tions and specifically by those standard deviations which corre- 
spond to the p’s for which we seek, and which are therefore as 
yet unknown. Buta second observation here comes in to save 
us from this difficulty: the standard deviations are almost 
identical for all the p’s which we have considered. It therefore 
makes very little differerice whether we use one of them or 
the other. In particular, we could quite properly use one 
computed from the observed ratio, 0.337,699. 

Let us carry out this computation. 

Let us agree to admit as plausible any value of p for which 
®(y) exceeds o.o1. From Appendix IV we find the correspond- 


ing value of y to be 2.576. .Next we compute an approximate - 


/ 
ie i ee 


§ 99. WELDON’S DICE DATA 275 


standard deviation by using for p the experimental approxima- 
tion 0.3377. The result is 265.7. Multiplying this by 
y = 2.576 we find that our allowable deviations are + 684. 
This means that the limiting expectations are 106,602 + 684, 
and that therefore the limiting p’s are (106,602 + 684)/315,672 
= 0.3377 + 0.0022. 

If, then, p is less than 0.3355 or greater than 0.3399 there 
is less chance than one in a hundred that an attempt to repeat 
the experiment would give a result that differed from expecta- 
tion by as much as did the one under discussion. 

We could even reduce the labor still farther by choosing in 
place of 0.01 as the particular probability which we will begin 
to regard as “ unusual,” that probability which corresponds to 
y = 1. (It is actually 0.3176.) We thus save ourselves the 
necessity of finding y from Appendix IV, and we also save a 
multiplication, getting our + term by simply dividing the 
standard deviation by the number of trials. In our example 
this gives 265.7/315,672 = 0.000,84. 

This last is the usual practice, the answer to the problem 
being written 0.3377 + 0.000,84. Without any further expla- 
nation the reader then understands that this + term corre- 
sponds to y = 1, and forms his judgments accordingly. In 
particular, if he wishes to set such limits upon p that, if p is 
really outside them, there is less chance than o.oo! of getting 
_ a sample which deviates as much from expectation as did his 
experiment, he would look up in Appendix IV the value 3.291 
which corresponds to @ = 0.001 and multiply the + term by 
it, thus getting as his limits p = 0.3377 + 0.0027. 


§ 99. Discussion of Example 49 


There are several things that should be said about the 
arguments which have been used in the last section. 

To begin with, it should be noted that we have done two 
things that are, in a sense at least, quite distinct. First, we 
attempted to answer the questions, “‘ Is § a sensible value for 


p?” and “Is 0.337,699 a sensible value for p? ~~ Then we 


276 PROBABILITY AND ITS ENGINEERING USES 


attempted to answer the question, “ Within what limits have 
we succeeded, by our experiment, in confining p?”’ The 
first question asks about the legitimacy of our guesses, the 
second asks about the precision of our experiment; and the 
elaboration of each of these points of view leads to a branch 
of the Theory of Probability which is of quite sufficient breadth 
to warrant its treatment as a separate subject of study. The 
first leads to generalized concepts regarding Goodness of Fit; 
the other to the subject of Precision of Statistics. 

We shall not in this text be concerned with the subject of 
Precision of Measurement; but having introduced this simple 
example, it is wise for us to note particularly the basis on which 
we justified its several steps. ~ First, we felt assured that we 
were dealing with a case of independent trials under identical 
conditions; and that therefore our data obeyed the Binomial 
Law. The Normal Law entered only_as an approximation to 
the Binomial. In § 82, in discussing such uses of the Normal 
Law, we saw that the approximation was fair so long as y3/o was 
small. This would seem to justify its use in the present case 
since y?/c is less than 0.7 in all of our computations. Second, 
we made use of the fact that the standard deviation was 
approximately constant over the range of values of p which 
interested us, justifying this by the fact that we had already 
computed enough standard deviations in Table XIX to know 
that it was true. 

It is entirely possible, however, to set up illustrations in 
which these approximations are not so well justified, and 
hence it is unwise to make use of such methods without keeping 
in mind the conditions which underlie them. The student to 
whom the subject of Precision of Statistics is of interest 
should follow the matter further in some treatise on the sub- 
ject, such as those given at the end of the chapter under the 
References for Outside Reading. 

Finally, with respect’to the first half of our discussion, in 
which we sought to determine whether such a guess as p = 4 
or Pp = 0.337699 was tenable, we must again repeat that a 
high. value of ® is only significant (1) provided this high value 


/ 


§ 100. WELDON’S DICE DATA 277 
of @ has not been artificially brought about through the use 
of our data in determining p, and (2) provided there is no 
@ priori prejudice against the particular value of p which is 
being considered. As we proceed with the chapter we shall 
learn how to compensate for the bias that is brought about by 
the use of our data in the computation of our probabilities; 
but we can do nothing whatever to take account of their 
inherent likelihood. If, therefore, this inherent likelihood is 
to receive any consideration at all, it must be in the mind 
of the investigator when he comes to th interperetation of his 
results. 


- § 100. Some Instructive Illustrations; Weldon’s Dice Data 


The figures which we gave in Example 49 were totals 
obtained from an experiment made by the English biologist, 
Weldon, who performed it under circumstances somewhat 
different from those stated in the example. Instead of using 
one die, he used twelve, throwing all twelve at once and 
recording the number of §’s and 6’s that appeared. Table XX 
contains a summary of his results. We shall now turn our 
attention to the data in this form, considering in particular 
the two questions which follow: 


Examp.Le 50.—Is Weldon’s dice data, as contained in Table XX, 
consistent with the assumption that the twelve dice were unbiased? 


Examp.te 51.—Is Weldon’s dice data, as contained in TableXX, 
consistent with the assumption that the probability of 5 or 6 had the 
same value 0.3377 for each of the twelve dice? } 


11f we admit that the dice are very probably all different, and denote the prob- 
abilities of showing 5 or 6 by p’, p”,..., p™, we can so determine these p’s that 
all of Weldon’s observed values agree with their expectations. This is exactly anal- 
ogous to what we did in the latter half of §98. Before we could place much reliance 
in our results, however, we should be forced to find within what limits each of these 
values would have to be confined if the actual observations were not to be too improb- 
able. That is, we should have to determine the precision of this experimental deter- 
mination of the various p’s. 

This procedure is probably a more logical one to follow than either of those outlined 


278 PROBABILITY AND ITS ENGINEERING USES 


We shall find that it is best to carry these two examples 
through simultaneously, as most of our remarks about one 
of them will be equally true of the other also. 


TABLE XX 


WE tpon’s Dice Data 


Number of | Observed Frequency 
s’s or 6’s Frequency Ratio 

fo) 185 0.007033 

I 1,149 | 0.043678 

2 35265 0.124116 

5 55475 ©. 208127 

4 6,114 0.232418 

5 53194 a) © 107445 

6 3,067 | 0.116589 

a 1,33L_| 0.050597 

8 403 | 0.015320 

9 105 | 0.003991 

Ke) 14 | 0.000532 

II 4 | ©.000152 

12 © | ©.000000 

Total 26,306 


We note, to begin with, that when the twelve dice are 
thrown the chance that just 7 show 5’s or 6’s is, in either case, 
given by the Binomial Law ! 

beep 


| ine Gop, i pes 
the only difference being that the p is $ in one case and o. CRY 


in the examples, but unfortunately it requires a great deal of computation in finding 
the first estimates of the p’s, and requires a more complicated: form of statement 
regarding the limits to which they must be confined. In fact, it is not a practical 
method at all, unless the problem justifies considerable labor. 

The questions which we have formulated are, on the other hand, not at all difficult 
to deal with from a computational standpoint, though they lead to some theoretical 
considerations which are not altogether simple. 


1We would ordinarily write P(j) instead of pj, but the uses which we are to make 
of the symbol in the present case are such that it is desirable to use the simpler symbol, 


§ 100. 


in the other. 


WELDON’S DICE DATA 


219 


The second columns of ‘Tables XXI and XXII 


are obtained by direct computation from this formula. 


TABLE XXI 


Discussion oF WeELpDon’s Dice Data ir Dice WERE TRUE 


Number of 


5’s and 6’s 


Con ANNA W YH HO 


II 
12 


Probability 


090099090900900900 


Pi 


007797 
046244 
127171 
211952 
238446 
190757 
111275 
047689 


Observed Expected 
Frequency | Frequency 
ny e(77j) 
185 202.75 
1,149 1,216.50 
3,265 33345 37 
55475 5,575.61 
6,114 6,272.56 
55194 §,018 .05 
3,067 2,927.20 
1,331 1,254.51 
403 392. O4 
105 87.12 
14 13.07 
4 1.19 
fo) 0.05 


26,306 .02 


Deviation 
from Divergence 
Expectation 
6,7 
6 od 
j (ny) 
Sa ETCTS 1.554 
ethane 3-745 
SS eOOnRiy 1.931 
—100.61 1.815 
—158.56 4.008 
+175 .95 6.169 
+139.80 6.677 
+ 76.49 4.664 
+ 10.96 0. 306 
“917.88 3.670 
OEY 0.066 
ae as 
= tov0s 6.143 
x°= 40.784 


Next we must remember that each of Weldon’s 26,306 
throws constituted an independent trial, and that therefore 
the expected number of throws showing just 7 5's or 6’s is 
26,306 p; These expected frequencies are shown in the fourth 
| The third columns contain the observed 

frequencies, as taken directly from Table XX, while in the 
- fifth columns are listed the deviations from expectation, 6). 

This completes that part of our discussion of Examples 

go and 51 which concerns itself with estimating the magni- 

tudes of the various probabilities. We must next attempt to 


columns of the tables. 


find out how plausible the estimates are. 


PROBABILITY AND ITS ENGINEERING USES 


280 
TABLE XXII 
Discussion or Wetpon’s Dice Data ir Dick were Equatiy BiaAsEpD 
Ntneeee Ob d E d Deviation 
boa ae Probability F pea F ee from Divergence 
5 S$ an s | requency requency Expectation 
; UE 
J Di nj (743) §j eS 
fo) 0.007123 185 187.38 = B85) 0.030 
I 0.043584. 1,149 1,146.51 =p Bazi) 0.005 
2 0.122225 35265 3,215.24 + 49.76 0.770 
3 0.207736 55475 5,464.70 FipELOngO 0.019 
4 0.238324 6,114 6,269.35 | —155-35 3-849 
5 0.194429 5,194 5,144.65 + 79-35 1.231 
6 0.115660 3,067 3,042. 54. + 24.46 0.197 
7 ©.050549 1,331 1,329.73 Se bly 0.001 
8 0.016109 403 423.76 — 20.70 TOUT, 
9 0.003650 105 96.03 cnSaOy, 0. 838 
ie) 0.000558 14 14.69 =) GEG) 0.032 
II ©.000052 4 1.36 So L(y) 688 
12 ©,000002 fo) 0.06 — 0.06) a 
26,306.00 x?=12.677 


§ 101. 4n Approximation to the General Multinomial Law 


Viewed in a general way, Example 50 deals with a complete 
set of mutually exclusive events, for on a single throw of the 
twelve dice there must be either no successes, or one, or two, 
or some other number not exceeding twelve. Moreover, 
Weldon’s 26,306 throws constitute repeated independent trials 
of exactly the sort contemplated in § 26. Hence the chance 


of each member of the set occurring with exactly a specified. 


frequency can be determined from (24). In particular we 


could compute, if we desired, the chance that another 26,306 - 


throws conducted under the conditions laid down in Example 50 
would exactly reproduce Weldon’s results. That, however, is 
not quite what we need. Instead we want to know the chance 
that such a test would give a result that is no more likely than 


§ Io, THE MULTINOMIAL LAW 281 


Weldon’s, and this would seem to require the addition of a 
great many terms of the form (24). We must therefore seek 
for some simple approximation which will bear to the Multi- 
nomial Law (24) a relation analogous to that of the Normal 
to the Binomial. 

Example 51 also deals with a complete set of mutually ex- 
clusive events, and to this extent is identical with 50. But 
the two differ in the same important respect that made the 
second discussion of Example 49 differ from the first: in 51 
a certain degree of agreement has been forced upon Weldon’s 
data by the act of computing the number 0.3377 from it. 
To be exact, we have forced an agreement between the average 
number of successes and the number expected, so that the 
relation 


bees ‘ 
ad Ny = Lf pis 
or, if we prefer to write it so, 
Lj my = m Xf ps, (128) 


is satisfied. It is therefore not fair to use, as a criterion for 
the plausibility of our assumed conditions, the unconditional 
probability of getting a result that is as unusual as Weldon’s. 
Instead we must get the conditional probability that a test, 
conducted with the probabilities listed in Table XXII, dut 
required to satisfy the auxiliary condition (128), would give a 
more unusual result than Weldon’s. 

Had we used a more elaborate formula than the Binomial 
in arriving at our estimates — for example, had we admitted 
the possibility of the dice not being identical—we might 
have forced an agreement between experiment and estimate 
in even more respects, and in that case our criterion for good- 
ness of fit would be the conditional probability if the hypothetical 
test were required to satisfy the same set of auxiliary conditions 
as were used in building up the estimate. Hence for the purposes 
of our study we need, not only an approximation to (24), but 
also an approximation to a certain sort of conditional prob- 


282 PROBABILITY AND ITS ENGINEERING USES 


ability as well. We shall find, however, that the derivation 
of the conditional probability will be a very simple matter 
once the unconditional one has been obtained. 

Let us, then, consider a set of s independent events, 
for which the individual ‘probabilities of occurrence are 
Diy P2) +++ 5 Ps If m independent trials are made, the chance 
that the first of these events occurs just 7 times, the second 
just 2 times, and so on, is, by (24) and (5), 


m!\ 
PAD Hasta 


aD pat... pi (129) 


Ni Ito" oh 
It must be remembered, of course, that 
m+no+.>.+2, = mM. (130) 


We now assume that every 7; 1s large enough that it can, 
without serious error, be replaced by its Stirling approxima- 
tion.! Introducing these approximations, and making some 
simple rearrangements, (129) becomes 


rm) Pi po... D2. ae ("2 ) 7 (By | , 


Next we replace each n,; by the corresponding deviation meas- 
ured in terms of its standard deviation. ‘That is, we write 


ak 6; eit mp; 
of acs. > 


Oj Oj 


where oj? = mp,(1 — p;). This gives us, for each factor of 
(131), a new factor of the form 


on Vy\ PIT i VEE 
mM pj 
and we seek an approximate value of this when m is large. 


We have carried out similar transformations often enough 
that there should be no need for extensive explanation. We 


1This means that we shall not often want to use the resulting approximation 
when any 7; is less than 2 


§ Iol. THE MULTINOMIAL LAW 283 


take the logarithm of the entire group of factors, expand each 
term 1n a series, and collect the results according to descending 
powers of m. The final result is 


log [(W 2mm)" 'V pi po... p, P| 


= — ¥(« yy + = *"" + terms of lower order). (132) 


mp; 
Next we go back to (130) and replace all the v’s by their 
corresponding y’s. The result is 
Di oy yy + m Xi py SSNs 
whence, since 4% py = 1, it follows~that Lo;y, = 0. There- 
fore, when we remember the value of o;?, (132) becomes 
I 

(Vorm)*'V 1 po... Ds 

The form of the exponent to which we are thus led suggests 
the substitution of a new variable x; = y;V1 — p; in place 
of y; A little later we shall discuss what this new variable 
signifies. For the moment we had best keep our attention 


focused upon the purely mathematical ideas. Let us, then, 
change to x, in place of y;, thus obtaining 


NAO ee sae A 42 020-0) 


é 


(133) 


I 


= sa —$ 22; 
ACY fy Os: zs) (Vonm)'V pi Paes mee (134) 


where 


ed eer PRB eae hy ae eRe 
1 = x2 WV mp2 5) ey V mp. (135) 


This is the approximation to (129) for which we have been 
seeking. Before attempting to use it, let us take up the 
question, to which we referred at the beginning of the section, 
of the changes that must be made in it when our probabilities 
are subject to auxiliary conditions such as (128). 

However many of these auxiliary conditions there may be, 
let us call the satisfying of them “event B.” Then the thing 
which we desire to find is the probability of obtaining exactly 


284 PROBABILITY AND ITS ENGINEERING USES 


the values 71, %2,...,%s if event B occurs. By Bayes’ 
Theorem we have at once, however, the relation 


Pras M2, +++5 Ns) eee apes mes) 
DP (18 Nant Ss oT ee ee 


it being understood, of course, that the summation 1s to be 
taken over every possible set of values of the v’s. If, then, 
we can find the conditional probabilities which occur on the 
right-hand side of the equation, we shall_be able to get the 
formula for which we seek. This, however, is an exceedingly 
simple matter. A set of values of the 7’s either does or does 
not satisfy the auxiliary conditions. If it does, Pr, w,...,n,(B) 
is unity; if it does not, zero. Hence we get the simpler ex- 
pression 


Pals to, «e435 a= 


P(m, TED hens tery Ns) 

>> Py N2Qy 2-4 5 n)s. 

the summation this time being taken over only those sets 
which satisfy the conditions B. 


P3(m, 1125 + iis) = 


Now, no matter what set of values of the ”’s we may choose, » 


so long as it is an admissible set, the denominator of this 
fraction is always the same. It is, in fact, a constant of such 
a nature that, if the unconditional probability of any admissible 


set of values is divided by it, the result is the conditional 
probability of that same set. It will therefore be simpler for © 


us to write simply 
Pais Hoyas erplls) = UC GNI, ae Aa 
since we can always determine the value of the constant K 


when we need it by means of the fact that the sum of such 
terms, taken over every admissible set of n’s, must be unity. 


This effects a very great simplification in our problem, for 


it means that, except for a multiplicative constant, our prob- 
abilities have exactly the same formal definition no matter 
whether they are subject to auxiliary conditions or not. 
Because of this we may as well write our approximation (134) 
in the form 


Pn, m2, ... 42.) = Ke, (136) 


§ 102. THE MEASURE OF GOODNESS OF FIT 285 


which may then serve our purpose in every case provided the 
K is given a suitable value; and by a suitable value we shall 
mean 17 every case that value which makes the sum of the 
probabilities unity. 


§ 102. The Measure of the Goodness of Fit, P(> x?) 


In the discussion of Tables XXI and XXII we are actually 
interested in just 13 different variables, which we have denoted 
by the symbols 7, m1, ..., 12; but it will be wise for us to 
think for a moment in terms of only three in order that we 
may be able to visualize certain geometrical ideas. Suppose, 
then, that we had only the three variables m1, m2, m3 so that 
(136) reduced simply to 


P(m, N2,y N13) = K en att atta) | 


If two sets of values of these 7’s are equally likely, the exponen- 
tial factors in their probabilities must be equal. Conversely, 
any two sets of ’s are equally likely for which the relation 


x12 + wo? + x3? = 7? (137) 


is satisfied. Written explicitly in terms of the 7’s this relation 
becomes 


(11 = mp1)? (n2 = mp2)? (3 = mp3)? ‘a 
oe i ta et 
mp1 mp2 mips 

If we were to represent sets of m’s by points in three- 
dimensional space, all those points which lay upon the ellipsoid 
defined by (138) would have the same probability P = Ke”, 
and we see at once that the smaller the value of 7? is the 
smaller the ellipsoid and the /arger the probability. | 
Corresponding to different values of 7 there are, therefore, 

a set of ellipsoids defined by the equation (138), all having 
the same center (mpi, mp2, mps), and no two intersecting. 
These ellipsoids form a sort of nest, one within another, of 
such a nature that the probability P decreases progressively 

as we go outward from their common center. 


Piss) 


286 PROBABILITY AND ITS ENGINEERING USES 


Suppose, now, that an experiment has given us a particular 
set of values of the three v’s, and that we have computed the 
sum of the squares of the three «’s which correspond to these 
values and found it to be x2. Suppose, moreover, that we 
have made estimates of the probabilities which underlay the 
experiment and are attempting to check the reasonableness 
of these estimates in accordance with the criterion laid down 
in §97. What we must do is to add together the probabilities 
of all admissible sets of values which are less likely to occur 
than the experimental one. We know, however, that the 
points which correspond to these sets all lie outside the ellipsoid 
for which r =x. Hence to check the plausibility of our 
estimates we need only add together the probabilities corre- 
sponding to all those admissible points for which 7 > x. 

As a method of carrying this out, we shall replace the 
summation over discrete points by-an_ integration over con- 
tinuous space, just as in § 93 we replaced a summation with 
respect to a single variable by an integral. To do this, how- 
ever, we must first multiply (136) by the Jacobian of the 
transformation from the 7’s to the x’s in order to obtain an 
expression for P(x1,%*2,...,%.). By actual computation 
from (135) this Jacobian is found to be (Wm)? V 1 P2-e~Dy- 
As this is a constant so far as the x’s are concerned its product 
by the K which is already present in (136) is just another 
constant, and we need no new symbol to represent it; for we 
must eventually compute the right value of K by making the 
sum of the admissible probabilities unity, and any multipliers 
which we may drop now will automatically reappear in thi 
_ process. Hence we write 


a5) 


ie as Hos cin, ee (139) : 


where r? is written briefly for the sum of the squares of x’s. 

Returning to our three dimensional case, we notice that 
what was the ellipsoid (138) in the space in which the 7’s 
were represented becomes the sphere (137) in x-space, with 
its center at the origin and its radius equal tor. This, indeed, 
is just the significance of the x’s which we introduced into 


§ 102. THE MEASURE OF GOODNESS OF FIT 287 


(133) merely for simplicity of expression: they are deviations 
measured in such units that equal ‘‘ vector deviations” are 
equally likely, no matter what their “ directions.” 

Our integration, therefore, extends over all those admissible 
values which lie outside a sphere of radius x. But before 
we can carry it out, we must know just what regions contain 
these admissible values. To this end we return to the two 
equations (128) and (130) and notice that they are both of the 
form 

DD a; Nj = m Xi a; p;. (140) 


In (130) the a’s are all unity; in (128) a = 7. Later on, in 
our discussion of the methods of obtaining empirical equations 
to represent statistical data, we shall find that all our estimates 
are arrived at through the use of equations of this form. We 
therefore restrict our discussion to this type of auxiliary 
condition, which becomes, in terms of the x’s, 


be by; xy = Oy 
where the 4’s are constants related to the a’s by the rule 
b = 4j V mp;. 
In our three-dimensional case, any one of these equations 


is of the form 
by x1 + bo x2 + b3.%3 = 0, 


and represents a plane passing through the origin of coordi- 
nates. If there is only one such condition, then, the admissible 
points must all lie upon such a plane, and our integral is not a 
volume integral of (139) outside a certain sphere, but a surface 
integral outside a certain circle. If there are two such condi- 
tions, the admissible points are those which lie in doth planes, 
and therefore upon the line in which they intersect. In that 
case our integral reduces to the integral of (139) over those 
portions of a line further removed from the origin than a 
certain experimentally determined point. 

Now exactly this same situation exists in general. A 
single condition upon our variables (there is never less than 
one, because the condition (130) must always be satisfied) 


288 PROBABILITY AND ITS ENGINEERING USES 


reduces us from a space of s dimensions to one of s — 1 dimen- 
sions, and requires that we integrate (139) over all those 
portions of this space which are further from the origin than 
a certain predetermined amount x. And ¢ conditions reduce 
us to a space of s — g dimensions, again requiring an integra- 
tion over all that region that lies outside a hypersphere of 
radius x. So far as computation is concerned, therefore, we 
are interested only in the integral of Ke’ in a space of 
s’ = 5 —q dimensions. 

We can get some clue to how next to proceed by consider- 
ing what we would do in the case of a space of 1, 2 or 3 dimen- 
sions. In one dimension we would of course have immediately 


2K { ee dr. 
x 


In two dimensions we would take advantage of the fact that 
the integrand depends only on r and write our integral in the 


form 
Gore 2 
onk if Cer As 
x 


while in three dimensions we would write it as 


aK { Cae iar 
x 


It appears that in a space of s’ dimensions the integrand 
would be of the form e~”” r*~* dr multiplied by some constant 
the law of formation of which is not apparent.1_ But we have 
no occasion to know its exact value, since it can be amal- 
gamated with the unknown constant K whose value we must 
eventually determine anyway. We thus arrive at the result 


Po 2) = Kf e? r dr, (141) 


1 The constant is actually (V on)*/ (35’)!, as we could readily show if we cared 
to introduce hyperspherical coordinates. For our purposes we need not introduce 
this form of coordinate system, which is probably unfamiliar to the student. 


§ 103. THE MEASURE OF GOODNESS OF FIT 289 


We are now ready to determine the value of K. We have 
already said that the K must have such a value that the sum 
of all admissible probabilities is unity. But that sum is 
obviously just the integral (141) taken over every value of r 
from zero upward. This gives us 


Coen 
l= Kf e 2 yo} dr, 
0 


which readily reduces to the form 


I= Kf e “(2u)*~* du 
0 


if we make the substitution r?/2 = uw. Comparing this with 
(6), however, we arrive at once at the value 


il 


K aa aa ar 
gee Wag = 1) s 


Substituting this in (141) we have as our final result 


RASS) = 2 i Pome (142) 
x 


gi¥—1(d5’ — 1)! 


This formula, like many others of frequent. use, has been 
reduced to the form of tables, one of which is given in Appendix 
VIII. The values of P are given in the headings of the col- 
umns, and the values of x? in the body of the table. In order 
to find the probability of any estimate, therefore, it is only 
necessary to determine s’ and x?, whereupon the approximate 
value of P can be taken at once from the table. 


§ 103. The Solution of Examples 50 and 51 


We are now prepared to renew our discussion of Weldon’s 
dice data, with the direct purpose of determining whether the 
fact that his results were classified as indicated in Table XX 
affects our conclusion that the dice were probably biased. 

Our theoretical considerations have taught us that the clue 
to this question lies in the magnitude of the quantity which 


290 PROBABILITY AND ITS ENGINEERING USES 


we have called x2. We must therefore consider how it is to 
be computed. 
We notice that by definition x,?, which is usually called the 
“divergence ” of 7, 1s 
5;? 
ny = (1 =p) yp? = ; 
i ( Pi) Mig (1) 


To get x2, it is therefore only necessary to divide the square 
of each deviation by the expectation from which the deviation 
was measured, and then add the results. 

This has been done in connection with Tables XXI and 
XXII, with the results shown in the last columns. The only 
point about the table which is likely to occasion any surprise 
is the grouping of the last two entries in computing the diver- 
gence, the two numbers being added and used as if they were 
one. The reason for this is to be found in the theory under- 
lying our derivation of the x? test: in which we concluded 
that the substitution of Stirling’s Formula for the factorial 
was not justifiable when the number 7, was too small. How 
we divide the data up, however, is a matter entirely for our 
own judgment, and we can always place the divisions in such 
a way that a fair number of observations will fall in each. 

Upon the assumption that the dice are true (Example 50) 
we obtain x? = 40.75. Due to grouping the last two entries 
in Table XXI there are just 12 variables; and as they must 
satisfy (130) only 11 of them can be independent. Hence in 
entering the table of x? in Appendix VIII to find the probabil- 
ity of so large a deviation, we must make use of the set of 
values corresponding to s’ = 5 —qg = 11. The number 40.75 
is well off the right-hand side of the table; hence the probability 
of such an occurrence with true dice is very much less than 
0.01. From a more extensive table it would be found to be 
0.000,03. Looked at as a whole, as we did in § 100, Weldon’s 
result appeared to be even less likely, but from either stand- 
point it appears highly probable that the dice were inaccurate. 

Passing now to Example 51 we find x2 to be 12.68. But in 
computing this table, we have made use of two linear relations 


} 
t 


§104. THE MEASURE OF GOODNESS OF FIT 291 


among the 7,’s: (128) and (130). Hence this time in entering 
Appendix VIII for P(> x?) we must reduce the number of 
variables by 2, and use the row marked s’ = 10. We find 
P=0.25. That is, with identical biased dice for each of 
which the probability of throwing a 5 or 6 was 0.3377, a 
divergence as great as that observed could be expected about 
once in four times. 

I must emphasize again that these figures, taken only by 
themselves, decidedly do not say that the dice were probably 
biased. ‘They only force one of two conclusions upon us: 


1. Either the dice are biased, 
2. Or a very unexpected thing has happened. 


It is only because we feel that conclusion (1) is a plausible one 
—as of course we do in this case—that we choose it in 
preference to the less plausible (2). 


§ 104. Résumé of the Test of Goodness of Fit 


The discussion of the test of goodness of fit has occupied 
so many pages, and has been punctuated by so much mathe- 
matics of a kind with which the student is probably none too 
familiar, that it seems wise before leaving it to sketch hastily 
the main facts to which we have been led. 

We begin with the not unreasonable postulate that before 
an experiment is performed, there is a certain probability of 
it giving a specified result. We may not know what that 
probability is—if we did there would usually be no need for 
a test of goodness of fit; but at any rate such a probability 
exists. Also there is more than one possible result. Hence 
under the conditions of the experiment there are a group of 
probabilities, one for each possible result, which may be 
represented as a distribution function — or better, might be so 
represented if we knew what their values are. 

The difficulty which confronts us is exactly that we do not 
know what they are, but are trying by experiment to deter- 
mine them. Let us, however, ignoring this difficulty for a 
moment, assume that they are represented by the distribution 


292 PROBABILITY AND ITS ENGINEERING USES 


function of Fig. 35; and seek to learn how unusual the result 
of our experiment is. 

The nature of the experiment may be such that only a dis- 
crete set of events may occur — as was the case with the dice 
problem which we have been considering — or it may concern’ 
itself with such a variable as length, which can take any value 
whatever. No matter which of these is true, we may divide 
the entire range of variation into intervals (or classes) such 
as those shown in Fig. 35. If the variable can take only a 

discrete set of values, it may 

often be natural to make 

each of these values a dis- 

tinct class; we did this in the 

case of the dice experiment, 

except in’ the case ofetie 

7 values 11 and 12 which oc- 

Fie. 35. curred so seldom that we 

classed them together. But 

this is not necessary. We can classify our data pretty much 

as we see fit. The classes need not even contain equal ranges 

of the variable, as an indication of which we have made 
them obviously unequal in the figure. 

To each of these classes corresponds a very definite prob- 
ability. Hence we can compute the probability of m events 
partitioning themselves among the classes in such a way that 
there are exactly 7,in the jth class. When we do so we obtain 
the Multinomial Law (129), which is too complicated for 
purposes ‘of computation. We find, however, that it is 
quite accurately represented by the generalized Normal Law 
(139), provided none of the numbers 7, ts too small. This 
generalized Normal Law has, as its single variable, the quantity 
r? formed from the sum of the divergences ” of the individ- 
ual classes, the term ‘ "divergence ” meaning the square of the 
deviation of 7, from its expectation, divided by the expectation. 

So much for pure @ priori reasoning. Usually, however, 
when we conduct an experiment we do not know the exact 
form of the distribution function pictured in Fig. 35. Instead, 


p() 


§ 104, THE MEASURE OF GOODNESS OF FIT 293 


we are often attempting to find out what it is. Having gotten 
our experimental results we attempt as best we can to infer 
what the distribution function is like and arrive at some 
estimates as to the probabilities associated with the various 
class intervals of Fig. 35. Next we naturally attempt to 
check up the reasonableness of our results; and in doing this 
are led to the idea of finding how probable it is that an experi- 
ment, conducted under our assumed conditions, would give a 
result that is at least as unusual as the experimental one. 

To do this, we find the sum of the divergences of the 
experimental data, which we call x2, and then ask for the 
chance of an experiment giving a less likely (that is, a larger) 
value of r? than this. To get the answer to this question it 
is necessary to integrate (139) over all values of r which 
exceed x, the result of which integration is contained in the 
formula (142), or for practical purposes in Appendix VIII. 
Having thus obtained P(> x?), ifit is not too large we conclude 
that our assumed distribution function is a plausible one, 
in the sense that the observations would not be miraculous if it 
were the correct one, while if the probability is very small, we 
conclude that the experiment was probably conducted under 
conditions which differed materially from those assumed. 
However, the assurance we feel in these conclusions must be 
tempered by our judgment as to their inherent plausibility. 
We must not, for instance, accept a preposterous assumption 
as justified merely because we get a high value of P; or one 
which is almost certain to be true as being disproved by a low 
value of P: for a low value of P merely says that the result is 
unusual, not that it could not occur. 

When we compute P we find that it has different values 
according to the number of classes into which we have divided 
our range of possibilities. Hence the table of P’s is a double- 
entry table, arranged according to values of x? and the “ class- 
number” s’. Moreover, we find that this class-number is 
not the total number of classes s, but the number that, under 
the conditions of the problem, may be assigned values arbi- 
trarily — that is, the number of independent classes. Since 


294 PROBABILITY AND ITS ENGINEERING USES 


the sum of all the 7,;’s must equal the total number of events, 
no more than s — 1 of them can ever be assigned arbitrarily; 
and if in addition we make use of our data to compute con- 
stants of our assumed formula, as in the case of Table XXII 


we used the data to find the appropriate value of p, we must — 


reduce the class-number by one for each such condition. 
This is the only difficulty in the application of the test for 
goodness of fit; its justification, as we have seen, is another 
matter. 

In the sections which follow we give a few more illustrations 
of its use. 


§ 105. Some Instructive Illustrations; Some Telephone Data 


In Chapter X we shall find that there is ‘reason to believe 
that the probability of just 7 pieces of telephone apparatus 
being in use at a given instant is often given by the Poisson 
Formula. One case, in particular, to which we should expect 
this formula to apply reasonably well is the type of automatic 
equipment known as the “sender.” Hence as a second 
illustration we take the data contained in Table XXIII, 
which covers 3754 observations upon the number of senders 
busy in a panel type machine switching exchange. 

The Poisson Law is determined solely by its expectation «. 
In the case of our data the average number of busy senders 
is 10.44, and if we use this value as our estimate of e the 
Poisson Formula gives the values shown in the third column 
under the heading “ Expected Frequency.” The deviations 
from expectation are listed in. the fourth column; ‘the diver- 
gences in the fifth, and their total, which is x2, at the bottom. 
The quantities are all large enough that there is no occasion 


to combine any of the classes, except in the case of the top 


two (o and 1). Hence there are in all 22 classes. However, 
since we have determined our expectation from the data only 
20 of these classes are independent, for the only frequencies 
with which we dare compare our experimental results are 
those which have a sum 3754 and an average 10.44. We 
therefore look up the entry 43.43 in that portion of Appendix 


§ 105. SOME TELEPHONE DATA 295 


VIII which corresponds to the class number 20, and find that 
it is beyond the right-hand margin of the table — apparently 
at about P = 0.005. 

TABLE XXIII 


NumBeEr OF Busy SENDERS IN A TELEPHONE EXCHANGE 


Number | Observed Expected cee : ; 
Bie Fr bs sates Deviation, 6 Divergence, x? 
y equency equency 

; ed ee 
2 14 5.98 sp Bee 10.76 
3 24 20.82 Sr Shuts -49 
4 57 54-33 + 2.67 -13 
5 III 113.44 — 2.44 05 
6 197 197.38 — 0.38 oe 
7 278 294. 38 —16. 38 gi 
8 378 384.16 — 6,16 Ke) 
9 418 445.63 —27.63 1.71 
fe) 461 465.24 — 4.24 03 
II 433 441.56 — 8.56 17 
72) 413 384.15 +28.85 DIG) 
13 358 308. 50 +49.50 7:94 
14 219 230.05 —I1.05 53 
15 145 160.11 Se bys 1.43 
16 109 104.47 ae Zhe 58 .20 
17 5 64.16 = 7.16 80 
18 43 37-21 ae T9 ag 
19 16 20.45 — 4-45 97 
20 Us 10.67 3.67 1.26 
21 8 5.31 3.69 1. 36 
22 AU tyak =r $t SI 
3753-77 + 0.23 X°= 43» 43 


The fit is none too good, and we would probably reject this 
solution as unsuitable, were it not for the theoretical basis 
which underlies the choice of the Poisson Formula, and which 
we cannot entirely overlook when considering the significance 
of such a low number. 


296 PROBABILITY AND ITS ENGINEERING USES 


§ 106. Some Instructive Illustrations; Chips Drawn from a 
Normal Universe 

Our next illustration will be one in which the conditions of 
the experiment are controlled artificially, so that the data can 
be said in advance to follow a certain law. 

Example 52.—By reference to Appendix V it can be found that 
if a variable is governed by the Normal Law tt has a chance 0.1974 
of lying between — 0.25 and + 0.25; a chance 0.1747 of lying between 
— 0.25 and — 0.75, and so on, as shown in the third column of Table 
XXIV. An experiment was performed by marking 197 chips with 
the number 0; 175 with each of the numbers — 0.5 and + 0.5; and 
so on, the number carrying each marking being the same as the first 
three digits of the probability corresponding to the interval in which 
this number lay. These chips were then placed in a box, thoroughly 
mixed, and one drawn out. After its marking had been noted it was 
replaced, the contents of the box again mixed, and another drawing 
made. The results of 1000 such drawings were distributed as shown 
in the second column of Table XXIV. 

What is the probability that another similar experiment would 
deviate from expectation as much as this one did ? 

The entire computation is shown in Table XXIV. The 
number expected is just the theoretical probability multiplied 
by 1000; hence the only condition imposed upon our theoretical 
variables is that their sum shall equal the number of drawings. 
As there are 13 classes in the table, this leaves 12 independent, 
and we enter the table of x? with the class number s’ = 12. 
The answer to the question asked in the example is therefore 
0.74. This is a very high probability — our drawings were 
nearer to the Normal Law than we could reasonably have 
expected. 


§ 107. The Determination of a Suitable Distribution Function 
when no Theoretical Formula is Known : 

We now have a method for testing how well our assumed 
distribution fits the data, but so far we have applied it only to 
data which we had reason to believe followed some one of our 
well-known laws. In many cases we have no such information 
in advance and are forced to make a purely empirical choice 


of the curve. Our next problem is to build up a systematic 


method of attacking this problem. 


§ 107. THE CHOICE OF A DISTRIBUTION FUNCTION 297 


To begin with, we remember that moments and expecta- 
tions are correlated ideas, the moments being derived directly 
from data and the expectations being the analogous quantities 
as computed from the distribution functions. An actual 
experiment is not likely to give us moments that are exactly 
equal to their corresponding expectations, but we have seen 
in Chapter VII that if the experiment is extensive the chance 


TABLE XXIV 


An ARTIFICIALLY CONTROLLED EXPERIMENT 


» Number Theoretical Number ee : 

Maree || Observed | Probability |- Expected | Devi@%on | Divergence 
=H.8 5 0.0024 eA + 2.6 ones 
SONI 9 0.0093 Gna = 03 0.01 
a) 36 0.0278 Pgpess) ap lee) 2.42 
San 55 0.0656 65.6 = 1OnG Eft 
Ss) 123 0.1210 1210 ap AS 0.03 
—9.5 165 0.1747 174.7 30,7 0.54 
0.0 203 0.1974 197.4 a Salo 0.16 
Ons 172 0.1747 174.7 OT, 0.04 
+1.0 123 ©.1210 121 10 + 2.0 0.03 
Seles 68 0.0656 65.6 a Phen 0.09 
= Bee, 31 0.0278 Dats aF 8 0.37 
Sek 8 0.0093 Ong = g | 0.18 
53.0 2 0.0024 2.4 Oe | 0.07 
1000 0.9990 999.9 qr xs) x°= 8.47 

n=12 
P= 0.74 


of any great disagreement between them is small. It seems 
natural, then, when we know nothing about the nature of the 
distribution in advance, to assume that the distribution func- 
tion has expectations equal to the moments of the data. The 
assumption is false, of course, but it is probably as near the 
truth as any we can make. 

Our process of fitting data will then have, as its first ele- 
ment, the location of some distribution type which, by 
suitable choice of the arbitrary constants in its equation, may 
be made to have the desired set of expectations. In searching 


298 PROBABILITY AND ITS ENGINEERING USES 


for such a suitable type we shall find the measures of asym- 
metry and flatness of considerable assistance. 

If a distribution function is symmetrical all the expectations 
of odd order are zero, for negative and positive deviations have 


like probabilities and therefore destroy one another. It 


therefore seems natural to measure asymmetry by means of 
these odd expectations. However, the first expectation cannot 
be used, for it is zero by definition; hence it is customary to 
make use of the third. But if it is to measure a property of the 
shape of the curve, and not be affected by the scale to which 
it is drawn, it is necessary to specify a standard scale of measure, 
which is accomplished by requiring that the deviations be 
measured in terms of their own standard deviation as a unit. 
This gives us the following definition: 


The third expectation of the deviations of a variable n, expressed 
in terms of its own standard deviation-as a unit, is the asym- 
metry! of the distribution function which n obeys. It is denoted 
by the symbol \/B1. 


In addition to this measure of asymmetry, however, we 
need also a measure of flat- 
ness. To illustrate what we 
mean by this term, we may 
refer to Fig. 36, both curves 
of which are symmetrical and 
have the same standard devia- 
Rata tion, though they differ from 
one enetey to an. important 
degree. We phrase this difference by saying that B is “flatter ”’ 
than 4. ; 
The portions of area outside 4 and inside B are just equal 
to the portions inside 4 and outside B, otherwise the curves 
would not bound equal areas; and they are so shaped that their 


second expectations are equal. Otherwise the two would not 


have the same standard deviation. But the fourth expectation 
of 4 is obviously bigger than the fourth expectation of B, due 


1 Also called “ skewness,” 


§ 107. THE CHOICE OF A DISTRIBUTION FUNCTION 299 
to the fact that the larger tail of 4 plays an increasingly 
important part the higher the order of the expectation. Hence 
the fourth expectation affords a method of measuring flatness. 
The exact definition is: 


The fourth expectation of the deviations of a variable n, 
measured in terms of its own standard deviation, is the flatness} 
of the distribution function which n obeys. It is denoted by the 
symbol B2. 

We must observe at once, however, that in computing 
asymmetry and flatness we need not bother to reduce all our 
data to the units specified in the definitions. In fact, the third 
expectation 1s by definition the sum of quantities each of 
which is the product of a probability by the cube of a deviation. 
The probability being a pure number the third expectation 
varies as the inverse cube of the unit in terms of which the 
deviations are measured. That is, we may compute our third 
expectation to any scale we please, and then divide it by o 
in order to reduce it to the units demanded by the definition 
of asymmetry. Similarly, the flatness will ordinarily be 
obtained from the fourth expectation divided by o*. 

Quite similar concepts can be defined in the case of sets of 
experimental data, except that we deal then with moments 
instead of expectations. We may indicate them sufficiently 
well by means of the following parallel résumé: 


Properties of Distribution Functions 


The first expectation of a variable 
n is defined by (83). 


Deviations 5 are measured from 
this expectation. 


The first expectation of 6 is there- 
fore é, = 0. [See (96).] 


The square root of the second 
expectation of 6 is called the “ stand- 
ard deviation.” That is, o = ~/€. 


Properties of Experimental Data 


The first moment of a set of data 
is their average, as defined by (82). 


Deviations d are measured from 
. es 
this average. 


The first moment of the set of d’s 
is therefore 41 = 0. [See (93).] 


The square root of the second 
moment of the deviation is called 
the standard deviation. That is, 


Cy V tas 


1 Also called “ kurtosis,” 


300 


Properties of Distribution Functions 


The asymmetry is the third ex- 
pectation, provided the deviations are 
measured in terms of ¢ asa unit. In 
general the formula is +/61 =6:/o°. 


PROBABILITY AND ITS ENGINEERING USES 


Properties of Experimental Data 


The asymmetry is the third mo- 
ment, provided the deviations are 
measured in terms ofc asaunit. In 
general the formula is \/61 = s/o’. 


The flatness is the fourth moment, 
provided the deviations are meas- 
ured in terms of o asa unit. If not, 
the formula for it is B2 = pa/o. 


The flatness is the fourth expecta- 
tion, provided the deviations are 
measured in terms of ¢ asaunit. If 
not, the formula for it is B2 = €s/o*. 


In addition to these characteristics there are a number of 
others, with less precise geometrical significance, which have 
been found of service in sorting out the type of curve which is 
most suitable for a particular set of data. The only one which 
need claim our attention is the combination of quantities 


J = 361 — 2 62+6, 


to which we shall give the name “ Type Criterion.” For 
some types of distribution curves this type criterion is always 
positive, for some negative, and for still others zero. Naturally, 
then, it is a helpful thing to have. 

In Appendix XI we have listed the equations of the Nor- 
mal, Poisson and Binomial Laws, the Gram—Charlier Series, 
and the four Pearson Types which we have discussed; and 
underneath them have written out the general formule for 
the quantities «1(7), €2, €3, €4, 0, VB, and Bs, and have also 
indicated the sign of the type criterion for each. Appendix 
XI, then, is of the nature of a compendium of all the 
basic information which, underlies the fitting of curves to 
data. ey ES 

But we can do more than this. We can find, by suitable - 
investigations, that the only ones of these curves which can 
have, at the same time, 61 = 0 and J <o are the Pearson 
Type IV and the Gram-Charlier Series. Similarly the only 
ones for which both 6: and J can be zero are the Normal 
Law and the Gram—Charlier Series. In this way we can 
consider in turn every possible combination of values of the 


§ 107. THE CHOICE OF A DISTRIBUTION FUNCTION 301 


quantities 6, and J. When the results are arranged schematic- 
ally they lead to the table given in Appendix X.1 
Suppose, then, that we have a set of data to which we 

wish to fit a suitable distribution curve. From it we may 
easily obtain its various moments, and thus derive the five 
quantities 

n, average, 

o, standard deviation, 

/ Bi, asymmetry, 
82, flatness, 
J, type criterion. 


Then by entering Appendix X with the values of 6; and J 
we may sort out certain types which seem appropriate for our 
purpose. Finally, we may so determine the arbitrary con- 
stants in our chosen equation that the expectations of the 
distribution function are equal to the moments of the data.? 

Much of the labor of this final step may be carried out 
once for all by algebraic means, leading to formule in which 
it is only necessary to substitute the values of the quantities 
nN, o, 61 and 2 in order to determine the constants of the 
equation directly. 

Appendix XI also contains the equations which are needed 
for this purpose, listed with the title “ Equations for Determin- 
ing Constants.” 

We have then in Appendices X and XI all the information 
which is needed for the determination of our distribution 


1It must be remembered that f: is the square of the asymmetry and therefore 
can never be negative. 


2 We should not fail to observe that this process leads us to equations of exactly 
the form (140). For the ith expectation of the variable (which was called 7 in § 102) 
is 


e(j) = Dips; 
and the ith moment of the observed values is 
bij) = Zin. 


Equating the two leads at once to (140), the coefficients being aj = 7. 


302 PROBABILITY AND ITS ENGINEERING USES 


curve, provided, of course, that any one of the eight types 
which we have been discussing is suitable for the purpose. 
From this point on we can best discuss the problem of “ curve 
fitting ”’ by the consideration of several examples. 


§ 108. Some Instructive Illustrations; A Reconsideration of 
Weldon’s Dice Data 


As a first illustration, we consider again the data presented 
in Table XX. If we choose to forget its origin, we may ask 
ourselves to obtain. an appropriate distribution law for it. 
We first determine the various moments of 7 as shown in 


Table XXV. Then by the use of the formule ! 


pid) =a 
us(d) = ws(2) — 3u2(d) n — 7%, 
wa(d) = pa(n) — 4us(d) 2 — 6y2(d) n? — 74, 


we compute the various moments of the deviations, and then 
in turn the standard deviation, asymmetry and flatness. 

As a result of all these computations it turns out that both 
6, and J are greater than zero. Entering Appendix X with 
this information we find that either the Pearson Type I, the 
Binomial or the Poisson Formula (as well, of course, as the 
Gram-Charlier Series which can be used for any type of data) 
satisfies the conditions of the problem. Let us, then, attempt to 
treat the data by means of all of these methods except the 
Pearson one (for which the computations are difficult), and 
see which gives the most satisfactory fit. Our knowledge of 
the origin of the data would lead us to expect that the Binomial 
Law would be better than any of the others, but we-are laying 
aside this part of our knowledge regarding the data and treating 
it as if it had no theoretical significance. 


1 The first of these is identical with (94). The rest are obtained in just the same 
way as (94). 


§ 108. WELDON’S DICE DATA 303 


TABLE XXV 
THE Moments ann RELatsp Sratistics or Wetpon’s Dice Dara 
Observed 
n Frequency nf nf nif nif 
fo) 185 fo) fo) fo) ° 
I 1,149 1,149 1,149 1,149 1,149 
a) 35265 6,530 13,060 26,120 52,240 
3 55475 16,425 495275 147,825 4435475 
4 6,114 245456 97,824 391,296 1,565,184 
5 5,194 25,970 129,850 649,250 35246,250 
6 33067 18,402 110,412 662,472 33974832 
7 1,331 95317 65,219 456,533 35195731 
8 403 35224 25,792 206,336 1,650,688 
9 105 945 8,505 76,545 688,905 
10 14 140 1,400 14,000 140,000 
II 4 44 484 55324 58,564 
12 ° ° fo) ° ° 
26,306 106,599 502,970 2,636,850 15,017,018 
R= 4,0522694 

u2(z) = 19.119972 

us(2) = 100.23759 

palm) = $70.85904 


po(d) = p2(m) — n? = 2.699085 
us(d) = ws(n) — 3u2(d) 2 — 1% = 0.88347 


pa(d) == Te) — 4u3(a) n— 6y2(d) n?—nt= 20.9651 
c=V u2(d) me 1.642889 
atte [us(d)]? ee 6 
aig 4: 
V'B1 = 0.19912 
pa(d) 
= ———_ = 2.87782 
Te, wae 
J = 0.36331 


304 PROBABILITY AND ITS ENGINEERING USES 


Let us first consider the Binomial case. Referring to 
Appendix XI, we find that the constants of the formula are to 
be determined from the relations 


Thus we derive p = 0.3339325, and m = 12.13500. Using 
these values, we obtain for our empirical distribution function 
the formula 


12.135—n 


para C? "46.g39828) (oLob067.5) 


From this formula the values shown-in the third column of 
Table XX VI were obtained.! 

To get the expected frequency it is only necessary to multi- 
ply each probability by 26,306. Finally, the divergence is 
computed in the usual way. 

The resultant criterion as to goodness of fit is x? = 11.59, 
which is really smaller than that obtained in § 100. But we 
must not be misled by this fact. In the first place, the theoret- 
ical foundation underlying the Binomial Law strongly suggests 
that the value of m should be integral, and the use of the 


1 The process of computation was as follows: The value of p(o) = (0.660675) 12-135 
was first found by the use of logarithms. Then it was noted that 


p(n) (7-88 _ ,) (2339325 
pln — 1) n 0.660675 /” 
With the aid of a computing machine the value of this ratio is easily obtained for 


each value of m. Call it rz. Then the remaining probabilities are found from the 
formule 


p(t) =n po), 
P(2) = re p(1), 


p(n) = rn p(n — 1). 


§ 108. WELDON’S DICE DATA 305 


value 12.135 instead of 12 is itself open to suspicion, particularly 
so when we remember that there were really twelve dice. In 
the second place, in § 100 we required our formula to agree 
with the data in only wo particulars: that p = 0.3377, and 
that the expected number of successes should equal the number 


TABLE XXVI 


An Empiricat Brnomiat Formuta ror WExpon’s Dice Data 


Empirical Binomial Law 

Observed 

& Frequency 
Probability Frequency Divergence 
fo) 185 ©.007218 189.9 0.13 
I 1,149 0.043911 Ase 0.03 
2 35265 0.122567 2242 0.52 
3 5,475 0.207594 5,461.0 0.04 
4 6,114 0.237687 6,252.6 3807 
5 5,194 0.193880 5,100.2 9 (8) 
6 3,067 0.115589 3,040.7 223 
i Us 3 30 0.050790 1,336.1 0.02 
8 403 0.016345 430.0 1.70 
9 105 0.003765 99.0 0. 36 
Io 14 0.000592 TsO 0.16 
II 4 0.000058 ee aes 

12 fe) 0.000003 o.I 

26,306 ; ©.999999 26,306 .0 x? = 11.59 


observed. Here we are requiring in addition that they agree 
also as to standard deviation. Hence, when we enter Appendix 
VIII to find P(> x2) we must use in the present instance the 
row marked s’ = g, whereas before we could use s’ = 10. The 
result is, that we now have P(> x?) = 0.24, where before 
we had P(> x2) = 0.25. So even on the purely formal side, 
if we use our x? criterion correctly, we have no better result 
than before. 

Turning to the case of the Poisson Formula, we find from 
Appendix XI that the only constant upon which the formula 
depends (namely, «), is equal to 7. Using this value we 


306 PROBABILITY AND ITS ENGINEERING USES 


obtain! the provabilities shown in the third column of Table 

XXVII; and from them the frequencies shown in the next 

column, and the divergences in the column following that. 
In this case we have made our formula agree with the data 


in only two respects: the total number of successes and the 


average number. Hence it is only necessary to reduce the 


TABLE XXVII 


An Arrempt to Fir WeEtpon’s Dice Data wiry A Porsson Law 


Empirical Poisson Law 
Observed 
ve Frequency 
Probability Frequency Divergence 
fo) 185 0.01738 457.2 162.1 
I 1,149 0.07044 1,853.0 267.5 
2 3,265 0.14272 35754-4 63.8 
3 5,475 0.19278 —__ 5,071.3 32.1 
4 6,114 0.19530 Bae GIRS 185.6 
5 5,194 0.15828 eTOQ Ray 254.9 
6 3,067 0.10690 2,812.1 23 yal 
7 1,331 0.06188 1,627.8 54.1 
8 403 0.03135 824.7 215.6 
9 105 0.01411 yp D 190.9 
ie) 14 ©.00572 150.5 123.8 
II 4 0.00211 : 

12 fo) ©.00102 a T5 
26,306 ©.99999 26,305 .8 x? = 1648.0 


1 The computations were carried out as follows: The formula is 


06226 7 »— 4.0522694 
aye (4.0522694) 


> 
and reduces to e when x is zero. This value may be found by the use.of 
logarithms. Then it is observed that 
Pim) _-4.0522694 
pe 4) n 
the values of which are easily written down at once. Let us denote them by rn. 
Then the p’s are found by successive multiplications by these quantities rn, just as 
in the case of the computations which led to Table XXVI. 

In this case, however, the Poisson Law gives appreciable values for p(m) when 
nm exceeds 12. We have therefore interpreted the entry 7 = 12 as being equivalent 
to m > 12, the number 0.00102 which stands in the third column of the table being 
actually the chance of “ twelve or more” successes as given by the Poisson Law. 


n! 
— 4.0522694 


> 


———— ae 


§ 108. WELDON’S DICE DATA 307 


number of classes by two in entering Appendix IX, thus using 
the marginal number z = 10. But the value x2 = 1648 is so 
very great that P is extremely small. The fit in this case is 
very bad, and we are thoroughly justified in the presumption 
that the data did not obey the Poisson Law. 

Finally, if we make use of a Gram—Charlier Series, we have 
from Appendix XI the relations: 


TABLE XXVIII 
An Arremet To Fir We.pon’s Dice Data witH A GRraM—CHARLIER SERIES 


| 
Empirical Gram—Charlier Law 


Observed 
s Frequency 
Probability Frequency Divergence 
fe) 185 0.00989 260.2 210.79 
I 1,149 0.04521 T1803) 5 2) 1537 
2 3,265 0.12076 3,176.7 Quay 
3 $5475 0.20547 $,405.1 eEg° 
4 6,114 0.23630 6,216.1 1.68 
5 55194 0.19257 5,065.7 3.25 
6 3,067 0.11545 3,037.0 0.30 
7 rage 0.05209 1,370.3 Te ep 
8 403 0.01732 455.6 6.07 
9 105 ©.00415§ 109.2 0.16 
10 14 0.00069 18.2 0.97 
tr 4 0.00009 2.4 1.07 
12 fe) 
26,306 ©.99999 26,305 .8 x? = 41.08 
a =A = 4.0522694, fees V Bi Dae ia. 
og = 1.642889, 6 
A, =0, 


Pees 
A, = —— = — 0.00509083. 
Az = O, 24 5 


Hence our empirical series takes the form 


I iv 
= ——— (¢—o. 87 o/’’—0.00509083 ¢°), (143) 
Po) eerste 2 OUST ¢ 509083 $°), (143 


N — 4.0522694 


the argument of the ¢ being, in every case, 7.642889 


308 PROBABILITY AND ITS ENGINEERING USES 


By the use of this series! the probabilities shown in the 
third column of Table XXVIII were derived; and from them 
the expected frequencies and divergences Sioa in the next 
two columns. The value of x? is found to be 41.08. In 


finding the probability of so large a value we must give due 


regard to the fact that we have forced our empirical law to 
conform to the data in five respects: in the total number of 
successes observed, and in the first four moments of 7m. Hence 


1 The Gram-Charlier Series, unlike the other two laws of which we have made 
use, allows its argument to vary continuously. The same is true of the various 
Pearson Types. Hence the remarks which follow apply to them also. 

When dealing with experimental data-which assigns a certain observed frequency 
to each of a group of discrete values of 7, either because 7 is incapable of taking inter- 
mediate values, or because the observations have been arbitrarily classified in this 
way, we are forced to divide our theoretical distribution curve into a number of seg- 
ments which correspond as well as may be with the-classes into which the data has 
been grouped. In the present instance the experimental data is given for integral 
values of ” ranging from o to 12: so we make an equivalent set of divisions in the 
case of the Gram—Charlier Series. To correspond tox =1 we take the range 
0.5 << 1.5; to correspond to 7 = 2 the range 1.5 << 2.5, and soon. There 
is no difficulty about this choice, which seems perfectly natural in view of the con- 
siderations set forth in §93. The two end values, 7 =o and m = 12, however, 
are not so clear-cut, for the series would allow values of 7 as small or as large as we 
please. If we were to use ranges of unit width about these values as the centers, 
we would exclude the tails of our distribution function entirely, which seems not 
to be allowable since use has been made of them in evaluating the formule contained 
in Appendix XI. We choose what appears to be the only alternative: we regard 
the end ranges as extending from — © too.s and from 11.5 to ©, respectively. 

The probability corresponding to each value of 7 is now represented by a segment 
of area under the distribution curve, somewhat similar to those in Fig. 35. Obviously 
the function is varying too rapidly to permit us to use the ordinate in the middle 
of the segment as a satisfactory approximation to the area. We are forced, then, to 
compute the probabilities which appear in the third column of Table XXVIII by 
actual integration. We find, however, that the integral of (143) is 


P(n) = fore dn = $_1 — 0.033187 $” — 0.00509083 ¢/”".. 


By this we mean, of course, the indefinite integral. In terms of it, the areas of the 
segments, which are our desired probabilities, are expressed as follows: 


plo) = P@.5).— P(— &), p(t) = P(1.5) — P(e.s), 


p(2) = P(2.5) — P(t.5), 2.5 p(t2) = P(oo) — P(ir.s). 


§ 108. WELDON’S DICE DATA 309 


only 7 of the 12 classes into which our data is divided can be 
regarded as independent. We find that P(> x2) is very 
small indeed. (From a larger table than that of Appendix 
VIII it can be found to be approximately 0.000001.) 

What, now, can we learn from all this computation? We 
have seen that our criteria 8; and J automatically eliminated 
from consideration all but four of our numerous empirical laws. 
We have also seen how the computations may be carried out 
for three of these four, and how we may test the excellence of 
the fit obtained from each of them. Finally, we have found 
that, even had we had no advance knowledge of the true law, 
our process would have picked it out for us as being by very 
long odds more likely than either of the others — always 
remembering, of course, that such a conclusion is only justified 


In further explanation of this process of computation we present in Table XXIX 
the computations exactly as they were carried out. 


TABLE XXIX 


ComMPUTATION OF THE PROBABILITIES IN TABLE XXVIII 


ny ny — a (nj = a) oa ¢” gl” 
—oco —-a —-a O.000000 0.000000 © .000000 
O25) |, —3.552269 —2.162209 0.015302 0.141575 0.139522 
1.5 | —2.552269 1.553525 0.060151 0.168692 | —0.108767 
2.5  —=1. 552269 —0.944841 0.172373 —0.027393 —0. 508300 
3.6 | 0.552269 —0. 336157 0. 368378 —0.334412 | —0o.365878 
Ane 0.447731 0.272526 0.607390 —0. 355840 ©. 306482 
[sees 1.447731 ©. 881210 0.810897 —0o.060467 0.530140 
6.5 2.447731 1.489894 0.931874 0.160388 0.152848 
Tas 3.447731 2.098578 0.982072 0.150168 —0o.129979 
8.5 4-447731 2.707262 0.996608 0.064676 | —o.119762 
9-5 5-44773! 3-315945 0.999543 0.016334 | —0.043325 
10.5 6.447731 3.924629 ©.999957 0.002599 —0.008782 
11.5 7-447731 4- 533313 
co co k co I .0O00000 0.000000 0.000000 


a as eS 
The first three columns are self-explanatory. The next three are values read 
from Appendix V. In the case of positive arguments they are read off directly, while 


310 PROBABILITY AND ITS ENGINEERING USES 


provided there are no reasons, not included in the mathematical 
computations, for hesitancy in accepting it. 


§ 109. Sheppard’s Corrections to Moments Computed Wu 
Classified Data 


In our illustrations we have carefully avoided the use of 
any data which should logically be distributed continuously, 
and have confined our attention solely to such data as naturally 
falls into discrete classes. The reason for this has been to 
avoid a difficulty which presents itself in computing the 
moments of continuously distributed data, when that data 
has been artificially classified. 

To see the nature of this difficulty let us consider the dis- 
tribution curve shown in Fig. 37, and in particular the “ class” 
between the values 7; and m2. This class ought, theoretically, 


in the case of negative arguments use must be made of the fact that ¢” is an even 
function and ¢’” an odd one, and also of the fact that ¢-1(— ”) = 1 — ¢-1(7). 


TABLE XXIX.—Continued 


ny 439" Ayg” P(n3) p(n) n 
— co © .000000 0.000000 © .000000 
0.5 —0.004698 —0.000710 0.009893 eek) 
eas —0.005598 ©.0005 54 0.055106 pe ae 
25 ©.000909 ©.002588 0.175870 Cra 8 ; 
Bag 0.011098 ©.001863 0. 381338 ees, 3 
4.5 0.011809 | 0.001560 0.617639 0.23638 4 
Sats ©.002007 —0.002699 0.810205 pe EEE OTE 2) 
6.5 —0.005439 —0.000778 0.925657 ee PS 6 
7.5 —0.004984 0.000662 0.977750 et aes He 
8.5 —0.002146 0.000610 0.995071 ceed 8 
9-5 0.000542 0.000221 0.999222 SO on 
10.5 —0,.000086 ©.000045 0.999915 ©.00069 Io 
11.5 000009 { ae 
co 0.000000 O.000000 I .COO000 - 


These three columns having been written down, they were multiplied by the appro- 
priate factors to give 434” and 44¢’”, and then added to get P(m). The computation 


of p(m) then requires only the subtraction of each of these P’s from the one which 
follows it. 


§ 109. SHEPPARD’S CORRECTIONS 311 


ne 
to contribute to the ith expectation an amount iL n' p(n) dn. 
ny 


If, however, the entire probability associated with the class is 
artificially assigned to the mid-point, we obtain a value 


Nef p(~) dn, which is not the same thing at all, unless 


the class interval is so exceedingly small that the quantities 
p(m) and nm are both 

substantially constant 

throughout it. p(n) 

Now, what is true of 
these expectations in this 
regard is true also of 2, WV on, 
the moments of any ex- Fic. 37. 
perimental data which we 
may classify in this fashion. Over and above the accidental 
differences between these moments and the theoretical expec- 
tations to which they correspond, there is a certain regular 
error due to the classification of the results: if by some odd 
turn of fortune the experimental data happened to be of just 
such a nature that its moments accurately corresponded with 
the expectations of its distribution law, the process of arrang- 
ing the data in classes would destroy this agreement. 

By studying the problem in the light of the Theory of 
Mechanical Quadratures, Sheppard was led to the conclusion 
that some of the error thus introduced could be eliminated 
by the use of the formule which follow. The unstarred 
quantities are the “classified” or “raw” moments; the 
starred ones are the “‘corrected”” moments, and / is the 
“class interval” zz — m1. 


ui*(2) = ui(n), 
po*(d) = uo(d) — re h?, 
ua*(d) = us(2@), 
ua*(d) = wa(d) — Zh? u2(d) — goht. 


(144) 


312. PROBABILITY AND ITS ENGINEERING USES 


Just what advantage these corrections possess is hard to 
say. It is not at all difficult to build up situations in which, 
instead of improving the moments, they make them less exact. 
But it must be admitted that these situations are always of a 
somewhat artificial sort, and that when a distribution of more 
usual aspect is considered the corrected moments are likely to 
be better than the “raw” ones. The consensus of opinion 
seems to be that they improve matters more often than not, 
and that they should be used. Certainly the error which they 
aim to eliminate is real enough; the only doubt concerns the 
possibility of correcting it by any other means than that of 
allowing continuous variation to the variable, which the 
limitations of instrumental measurement, for one thing, will 
not permit us to do. 

We may take as an illustration the data presented in Table 
XXIV, in the derivation of which the-chips were marked in 
classes in just the way in which an instrumental classification 
might arrange them, though the aim was to duplicate a Normal 
Universe. If the “raw” moments of this distribution of 
data are computed, the second and fourth (which are the only 
ones affected by Sheppard’s corrections) are found to be 1.044 
and 3.190, respectively, while the corrected moments are 1.023 
and 3.062 instead. As the expectations of the Normal Law, 
to which the data was intended to correspond, are 1 and 3, 
the corrected moments are materially improved in this case. 


§ 110. The Distribution of Statistics 


Each of the quantities listed in Appendix XI is known as 
a statistic of its distribution law. “ Statistic,’ then, is a 
general term meaning “ moment,” “ expectation,” “ flatness,” 
“asymmetry,” “ average,” or indeed any other property com- 
puted from a distribution law, or from a set of data. 

Suppose, now, that we have computed a “ statistic” from 
a set of data— to make matters concrete, suppose we have 
computed the average of 50 observed numbers. What we thus 
get is not an absolute and invariable quantity which may be 


§ IIo. THE DISTRIBUTION OF STATISTICS 313 
reproduced whenever we wish by performing the experiment 
anew: indeed, another experiment would very likely give us a 
different result. All possible results are not equally likely, 
however. The values which we get for the average are gov- 
erned by probability, in just the same way as any other quan- 
tity which is subject to accidental fluctuations. There is for it 
a certain distribution law, with its own standard deviation, 
asymmetry, and flatness. The same is true of any other 
“statistic” which we might name. 

We may raise the question, therefore, as to how widely, and 
according to what law, such a statistic may be expected to 
vary. To go into this question in detail would involve us in 
an extended discussion of “ precision of measurement,” which 
is beyond the province of our text. We may mention in pass- 
ing, however, that the four most important statistics — the 
average, the standard deviation, the asymmetry, and the 
flatness — have been shown to obey laws of distribution which 
are very nearly normal.! Hence, since the Normal Law is 
determined solely by its standard deviation, we may get a 
pretty fair estimate of the faith we are justified in putting in 
the observed values of any one of these statistics by knowing 
the standard deviation of the statistic in question. We give, 
in Table XXX and in Appendix IX the standard deviation of 
each of these statistics, when computed from a set of N data. 

As an illustration of the use of these formule, we may 
compute the standard deviations of the statistics which we 
derived from Weldon’s dice data, and which are presented in 
Table XXV. The formula for the standard deviation of the 
average, as taken from Table XXX is o(7) = o/VN, which 


in the case of present numbers becomes 


a) 1.642889 
o Ss 
V 26,306 


= 0.01013. 


1 They are exactly normal when derived from data which itself satisfies the Normal 
Law; and approximately normal in other cases, 


314. PROBABILITY AND ITS ENGINEERING USES 


Hence, in accordance with the usual practice in indicating the 
“precision ” of our statistic, we should write 


A = 4.0523 +: 0.0101, 


as explained at the end of § 98. 

It is interesting to note that this agrees with the value 
given in § 98 for the precision of p; for obviously, if the limits of 
n are those set forth above, the limits upon p should be one- 
twelfth as great. Thus we get, 


Pp = 0.3377 + 0.0084, 
as before. 


TABLE XXX 
STANDARD DeviaTIons oF IMporRTANT STATISTICS 


(V is the number of observations from which the statistic is computed) 


Statistic . Standard Deviation of Statistic 
Special formula for 
Sym-| Sym- General 
Name 
bol bol Formula : 
Normal | Poisson Binomiat baw 
Law Law 
a o a eh aaa 
Average n a(n S= —= oS mp(x—?) 
VN VN VE Vv N 
Standard o aS SSS 
Deviati ‘4(d) —|u2(d) | 2e+1 2(m—1)p(1—p) + (2b—1) 
eviation o a(c) Yo SN / aN a/ aN 
Asymmetry Bers en 6 
(Skewness)| VB, o(V Bi) /§, 
Flatness 24 
(Kurtosis)} Be a (B2) W 


Next let us consider the standard deviation. By reference 
to Table XXX we find that o(c) should be given by the formula 


(co) = fe — [u2(d)]? 4 
4Nu2(d) . 
which readily works out*to be 0.00703. Hence it would be 
customary to write the result of the computation in Table XXV 
in the form ¢ = 1.6429 + 0.0070. _ 4 
In the case of o(Vg,) and o(82) no general formule are 


S111. CONTROL CHARTS ans 


given in Table XXX. The formule for the Normal Law, 
however, are usually a fair approximation, and are certainly so 
in the present case where we are dealing with data that is 
known to be distributed according to the Binomial Law, which 
does not differ in any very essential way from the Normal 
Law. If, then, we use the quantities V6/N and V24/N we 
get the results V6; = 0.1991 + 0.0151 and B: = 2.8778 + 
0.0302, respectively. 


§ 111. Control Charts : 


This knowledge as to the nature of the distribution law 
obeyed by the various statistics that are commonly derived 
from statistical data has been used in a very elegant way in 
the construction of “control charts,” the purpose of which 
is to reveal at a glance whether or not the universe from which 
a given sample was drawn was, or was not, of the type it was 
supposed to be. 

For example, let us consider samples of 100 individuals each, 
taken from a supposedly normal universe with an average o 
and a standard deviation 2.6. We have said that each of the 
four quantities 7, ¢, ‘V8, and £2 is distributed according to the 
Normal Law; and from Table XXX we find that their standard 


Ceviations, when gotten from samples of 100, are 


a(n”) = 0.260, 


a(a) = 0.184, 
o(-V Bi) = 0.245, 
o(B2) = 0.490. 


As the chance of a deviation exceeding the standard devia- 
tion by a factor of more than 2.5 is about 0.01, we may say 
that the chances are 100 to 1 that the experimental average 
will lie between + 0.65, the standard deviation between 
2.6 +£0.46, the asymmetry between 0.0 + 0.61, and the flat- 
ness between 3.00 + 1.23, if the experimental data really comes 
from the supposed universe. 


316 PROBABILITY AND ITS ENGINEERING USES 


Suppose, now, that we were confronted with the problem of 
determining which of a large number of samples handed us 
really had come from such a universe, and which had not. 
Suppose that these samples were numbered 1, 2, 3,..., and 
that we computed from each of them the statistics mentioned. 


AVERAGE 


STANDARD DEVIATION 


ASYMMETRY 


FLATNESS 


Fic. 38.—A Typicay Conrrou Cuart. 


Finally, suppose we plotted upon a chart, such as that shown 
in Fig. 38, the results obtained. The result would be, for each 
statistic, a jagged line drawn about the various expected values 
somewhat as shown in the figure. We would obviously have 
no reason to suspect those samples for which the statistics were 
all very near their expected values; or indeed any which lay 
well within the band marked off with the dotted line, and which 


/ 7 


§ 111. PROBLEMS re 


is, in each case, the band beyond which a statistic has only a 
I in 100 probability of appearing. On the other hand, a sam- 
ple for which certain statistics were outside this band would 
be decidedly questionable; and one for which all were outside, 
as is the case with number 6, would almost certainly not have 
arisen from a Normal Universe. 

Such a chart as this is known as a Control Chart. It is 
used principally in factory inspection and kindred fields, where 
it is desired to know at a glance whether or not some extraneous 
influence is causing exceptional deviations from standard. 
Only a limited amount of computation is required to make the 
type of check to which we have referred. If desired, however, 
additional statistics could be included. For example, the use 
of x? would give an even more sensitive check, and in this case 
the distribution function which it obeys is well known: it is 
merely the P(> x?) given in Appendix VIII. The student 
should have no difficulty in adding this also to his chart, if he 
so desires. 


/ 


PROBLEMS 


1. A “ direct advertising ” sales campaign is under consideration; 
and a trial batch of 1000 circulars is sent out. It results in 19 favor- 
able replies. Assuming that the Binomial Law sufficiently well 
represents the situation, state an upper and lower limit to the number 
of replies which may be expected from 100,000 circulars, it being 
understood that the limiting expectations are to be such that, beyond 
them, the chance of the trial batch showing Jg returns is less than 
0.01. 


2. How large a test batch must be used in Problem 1, in order 
that the upper and lower limits expected from a subsequent batch 
of 100,000 shall not differ by more than too? 


3. What is the least number of favorable replies in Problem 1 to 
assure a lower limit of expectation of 2 per cent? 


4. Formulate a recommendation for a direct advertising campaign 
which requires 2 per cent to pay, covering the following points: 
1000 circulars being standardized as the first test, (2) What shall be 
the “ acceptance number ” (that is, the number of favorable replies 
that shall be accepted as conclusive) ?; (4) What shall be the * rejec- 
tion number ’”’?; (c) What procedure do you recommend in doubtful 
cases? 


318 PROBABILITY AND ITS ENGINEERING USES 


s. The results of 27 independent solutions of Problem 3, § 80 
were as follows: 


onde Frequency of Occurrence (Individual Results) 
° fo) ° fo) I fo) ° fe) fo) fe) fo) 
I fo) fo) I fo) fo) I fo) fo) fo) fe) 
D I 2 2 fo) I 2 fo) 2 fo) I 
3 is 6 8 6 4 4 7 7 + 4 
4 13 se) 6 10 14 10 9 9 13 9 
5 II II 8 6 10 13 fe) 13 15 fe) 
6 13 12 14 II 12 12 9 12 10 16 
7 4 7 if ae y 7 8 4 5 7 
8 2 I D 5 2 I 6 3 2 3 
9 I I 2 ° fo) fo) I fo) I fo) 
10 ° fo) fo) fo) ° fe) fo) ° fe) ° 
Average 16.10) 25210. (5) 20 8542) SCN 4N 55 OD TG aoe SOO semen hE QO 
Number de 
er iicias Frequency of Occurrence (Individual Results) 
° ° ° fo) @-. 16 fo) ° ° ° 
I 2 I fo) I I fe) fc) I fo) fo) 
2 t 3 2 3 4 2 3 I 3 3 
3 ee py Pea 3 8 9 7 6 4 
4 16 iy 9 7 8 8 7 7 7 Uf 
5 ie) 6 20 fe) II 12 8 13 14 15 
6 7 II 7 8 II II 12 9 12 II 
il 4 6 7 5 8 8 5 8 4 6 
8 2 4 2 5 4 I 5 3 I 4 
9 I fo) ° ° fo) ° I I & ° 
10 fe) fe) fe) fe) ° ° fe) fo) fo) ° 
Averagen | 4108) 4no2ne 5 l2amAaSln hbo) MemOO hid OO mG hia mnem 22 
Number Frequency of Occurrence Total Total 
of Heads (Individual Results) Totals | for First | for Last 
fo) fe) fo) fo) ° fe) fo) fo) I I ° 
I I fe) 2 ° fe) I ° 12 II I 
2 2 I I I 2 2 I 46 41 5 
3 4 6 4 i ai 6 5 165 144 21 
4 th Se BI Be) A 6) 7 267 236 31 
5 9 Ei Sobfe) Gavi hed Bariole ais 306 267 39 
6 TAM eons 72 B12 Gp Hit 8 293 267 26 
7 Te babe, CET ew he ab oe igs 154 21 
8 2 2) B I 2 I I 70 66 4 
9 fo) I fo) fo) ° fo) 2 1S 13 2 
ie) ° ° ° fe) ° ° fe) ° ap oS 
Average |§.10 §.12 4.90 5.12 4.70 4.80 5.42] §.092 | §.107 | 4.873 


§ 111. PROBLEMS 319 
Are the results contained in the column headed “ Totals ” consistent 
with the assumption that the pennies were unbiased? 


6. Each individual column in this array may be regarded as an 
independent experiment to determine the average number of heads. 
The individual averages are all shown at the bottom of the columns. 
What is the probability of this array of averages, if the pennies were 
unbiased? 


7. What is the probability that the total of the last three columns 
came from the same universe as the first 24? 


8. What is the probability of the totals obtained, if the pennies 
were biased to the extent indicated by the average 5.092? , 


g. What would be the probability of obtaining the array of 
averages which we have presented, if the pennies were so biased? 


10. Test the goodness of fit of the Poisson Law in the case of 
Table XV. 


REFERENCES FOR OuTsIDE READING 
I. General: 
1. Rrevz: Mathematical Statistics. 
2. R. A. FisHer: Statistical Methods for Research Workers. 
3. ARNE FisHer: Mathematical Theory of Probabilities. 


II. Goodness of Fit: 


4. Pearson: On the Criterion that a given System of Deviations 
from the Probable in the Case of a Correlated System of 
Variables is Such that it can be Reasonably Supposed to 
Have Arisen from Random Sampling, Phil. Mag., 
Series V, Vol. 50 (1900), pp. 157-175. 

5. R. A. FisHer: On the Interpretation of x? from Con- 
tingency Tables, and the Calculation of P, Journal of the 
Royal Statistical Society, Vol. 85 (1922), pp. 87-94; 
The Conditions under which x? Measures the Discrepancy 
between Observation and Hypothesis, Journal of the 
Royal Statistical Society, Vol. 87 (1924), pp. 442-450. 


Ill. Sheppard’s Corrections: 
6. SHEPPARD: On the Calculation of the Most Probable Values 
of Frequency Constants, for Data Arranged According to 
Equidistant Divisions of a Scale, Proceedings of the 
London Mathematical Society, Vol. 29 (1898), pp. 


353-380. 


320 PROBABILITY AND ITS ENGINEERING USES 


IV. Precision of Statistics: 
7. Pearson: On the, Probable Errors of Frequency Constants, 
Biometrika, Vol. 2 (1903), pp. 273-281; On the Probable 
Errors of Frequency Constants: Part II, Biometrika, 


Vol. 9 (1913), p. 13 On the Distribution of the Standard - 


Deviation of Small Samples, Biometrika, Vol. 10 (1915), 
D.rsae 

8. “Srupent’”’: The Probable Error of a Mean, Biometrika, 
Vol. 6 (1908), pp. 1-265. 

g. R. A. FisHer: Frequency Distribution of the Values of the 
Correlation Coefficient in Samples from an Indefinitely 
Large Population, Biometrika, Vol. 10 (1915), pp. 507— 
§21. 


V. Control Charts: 
10. SHEWHART: The Application of Statistics as an Aid in 
Matntaining Quality of a Manufactured Product, Journal 
of American Statistical Association, Vol. 20 (1925), 
pp- 546-548; Quality Control Charts, Bell System 
Technical Journal, Vol. 5 (1926), pp. 593-603. 


i ae 


CHAPTER X 


THE TuHerory or PropaBiLiry as APPLIED TO PROBLEMS 
OF CONGESTION 


§ 112. Introductory 


“ Problems of Congestion ”’ arise in any phase of industrial 
life in which demands for service arise from a multiplicity of 
sources acting more or less independently of one another. 
For example, the turnstiles through which passengers pass into 
a subway platform are used by numerous individuals who act 
independently of one another to a large degree, though perhaps 
influenced by common working hours to travel for the most 
part at certain peak hours. The demands made upon a cash- 
carrier system in a large store are ““ independent ” in the same 
broad sense, though obviously they are influenced by peak 
shopping hours. 

We have already had an example of one closely related 
problem. In § 86 we investigated the stock of dog-biscuit 
which should be carried by a grocery store under certain 
specified conditions. This, which we may for brevity refer to 
as the “warehouse problem,” is in fact the simplest of all 
congestion problems. In it, the only question asked is: “‘ How 
many demands will be made within a given time?”’ Obviously 
this question is also part of the problem raised by the turnstile 
or the cash-carrier, but not the whole problem. For the 
passenger who uses the turnstile does not remove it per- 
manently from service, as the purchaser of dog-biscuit removes 
it permanently from stock. Instead, the turnstile is “ returned 
to service” after the passenger is through with it: that 1s, 
after a certain period called the holding-time. This holding- 
time is a new element in the situation. 

321 


322 PROBABILITY AND ITS ENGINEERING USES 


The fundamental turnstile problem therefore formulates 
itself as follows: ‘‘ Knowing the expected number of demands 
per unit time and the expected holding-time, how many paths 
(turnstiles, say) must be provided in order that the proportion 
of persons inconvenienced shall not exceed a preassigned 
amount?” 

This question is, however, still quite indefinite, for “ incon- 
venienced ”” may mean a number of different things under 
different circumstances. In the case of the turnstile or cash- 
carrier it almost certainly means “ delayed ”’; for the user does 
not disappear if no apparatus is available. If, however, we 
spoke of the number of chairs in a barber shop, a period of 
congestion would probably result in a loss of trade — and would 
‘therefore be to a certain degree its own cure, for the periods of 
congestion would obviously be shorter than if all the customers 
waited until served. This is probably no comfort to the barber, 
but we — and he — must face the facts nevertheless. 

I doubt if my readers are likely to become barber-shop 
engineers; but there are other places where problems of an 
exactly analogous kind are faced in engineering experience. 
In telephone engineering, for instance, a certain number of 
trunk lines are provided between two exchanges, and when they 
are all busy the subscriber is given a “ busy signal”? which 
causes him to hang up his receiver and repeat his call later on. 
This is not quite a case of “lost traffic,” for he very probably 
does repeat; but unless he repeats very soon — before the 
congestion is quite thoroughly cleared out —the length of 
such periods will be much the same as if he were to go away 
entirely. 

We have, therefore, two quite fundamental divisions in this 
problem of congestion: a “delayed traffic” division and a 
“lost traffic” division. It is our purpose in the present chapter 
to indicate how the Theory of Probability can be applied to 
these two problems, and to two others which we shall explain 
as they arise. 

As the methods of solution are the same no matter what the 


particular engineering application may be, it makes little | 


ee 


§ 112. PROBLEMS OF CONGESTION 323 


difference in what language we phrase our study. As telephone 
practice offers examples of widely diverse conditions, we shall 
choose it; and to make the study more understandable we give 
the following general explanation of the terms which we are 
to use: 

When a subscriber makes a call, it is designed for some par- 
ticular person, and must therefore be steered toward that 
person, either by human intervention — as in manual practice 
—or by mechanical intervention — as in machine-switching 
(automatic) systems. Whatever performs this steering func- 
tion. we call a “ switch.” 

Obviously a switch must pass a call on to something else. 
We call that something a “channel.” Such a channel will 
ordinarily be one of a “ group” performing identical functions 
— that is, any one of the group, could accommodate our call. 
There may be other channels to which our call might have been 
assigned if it were going somewhere else — to a different office, 
say; these however are not part of our “group.” There may 
also be other channels leading to the place we wish our call 
sent, and which are available to other subscribers though not 
to us. These also are not part of our “group.” When we 
speak of a “ group of channels” we shall mean a group each 
member of which is capable of performing identical functions 
for identically the same calls as any other. 

Usually these calls can come from a number of sources. 
For example, if the group serves calling subscribers directly, 
there will usually be a number of subscribers capable of using 
any channel of the group in exactly the same way.” This is 
the “group of sources” corresponding to the “group of 
channels.” In such cases, there must be something to locate 
a suitable channel and associate the calling source with it. 
Whatever performs this function, whether human or mechan- 
ical, we shall term a “ switch.” 


1It need hardly be said that it is not intended as a description of a telephone 
exchange. 

2 But the source need not be a subscriber. It may be a switch to which he has 
already entrusted his call. 


324 PROBABILITY AND ITS ENGINEERING USES 


We need no further knowledge of telephony for the purposes 
of our discussion. 


§ 113. Notation 
The principal symbols used are the following: 


n — the calling rate, that is, the expected number of calls 
per source per hour. 

T — the expected duration of a call, measured in hours. 

\ — the number of sources in our group. 

y— the number of channels in our group. 

p — the probability that a given source (sometimes a given 
channel) is busy at a random instant of observation. 

P(j) — the probability that if a particular group is examined 

at a random instant it will be found to contain exactly 
j busy members. 

II — the probability of a call being lost by reason of insuf- 
ficient equipment. 

e — the expected traffic density of our group; that is, the 
expected number of busy sources (or channels). 


To any of these symbols will be affixed such subscripts and 
superscripts as are necessary to characterize the particular 
conditions to which they are applied. 


§ 114. General Assumptions 


We make the following assumptions once for all as to the 
nature of our problem: 


Assumption 1—The system is in statistical equilibrium; 
in other words, the probability of finding it in any specified 
condition is independent of the time at which it is examined. 


It is quite true that no telephone exchange ever actually reaches 
a condition of statistical equilibrium. Its traffic varies from a light 
load at night to a peak sometime during the day, and then falls off 
again. During the time when the traffic is increasing, the probability 
of a large number of busy-switches is also increasing and therefore 
varying with the time. On the other hand, when the traffic is 
decreasing, the probability of lost calls is also decreasing. It is 
not hard to see that the probability of losing calls always lags 
somewhat behind the traffic, reaching its peak shortly after the 


§ 114. GENERAL ASSUMPTIONS 395 


peak load is reached, and its minimum shortly after the minimum 
load occurs. This was illustrated in a simple way by the results 
of Problems 8 and g of §89. But when the periodic fluctuation 
of the traffic is sufficiently slow, the peak value of the probability 
is substantially the same as the probability of loss figured on the 
basis of statistical equilibrium with the traffic density at its peak 
value; and in such cases it is safe to make the assumption stated 
above, designing the exchange entirely for the conditions of busy 
hour traffic. 


Assumption 2— Connection of sources to channels and their 
disconnection therefrom is effected instantaneously. 


This assumption is er.tirely tenable as long as the time consumed 
in the operations of connecting and disconnecting is small compared 
to the duration of the average conversation. In other cases an 
independent investigation is necessary; but to take into account 
such minor complications would only serve to obscure the main 
purpose of the present discussion. 


Assumption 3 — The expected traffic density is the same for 
every Source. 


This assumption is justified only by the fact that it is difficult to 
make any other which more nearly agrees with practical conditions. 
It does not mean that each subscriber originates the same number 
of calls. It means that the number of seconds in the busy hour 
during which a source is expected to be busy is the same for all 
sources. 

The assumption is evidently not satisfied in practice and some 
notion of the nature of the errors to which it leads is desirable. In 
an article on The Theory of Telephone Probabilities Applied to 
Trunking Problems in the Bell System Technical Journal for 
November, 1922, Mr. E. C. Molina has shown that it is on the side 
of safety, at least when the probability of congestion is small, as it 
usually is under operating conditions. 


Assumption 4— Busy sources make no calls. 


If this assumption is ignored, formule will be obtained which 
give the same probability of loss for the same amount of traffic, 
regardless of whether the number of sources is less than or greater 
than the number of available channels. As an extreme example, 
consider the case of 20 sources, each originating five calls per hour, 
and the alternative case of five sources, each originating 20 calls per 
hour, the average duration of the calls being two minutes in each 


326 PROBABILITY AND ITS ENGINEERING USES 


case. If each of these groups is assigned ten channels, such a formula 
would say that the proportion of lost calls is the same in both cases. 
However, it is obvious from a common-sense standpoint that if there 
are only five sources they cannot make use of more than five channels. 
Hence in the second illustration no calls can possibly be lost, while 
it is possible for the twenty sources of the first illustration to want 
more than ten channels at once. 


Assumption 5— Either every channel which can serve a 
source S; can also serve S2, or else no channel can serve them 
both. 


We have really inferred this in the explanation given in § 112. 
> 


There are cases in telephone practice (the “graded multiple” is a 
good example) which violate it. 


4 

Assumption 6— The number of busy channels in a group is 
equal to the number of busy members in its group of sources, 
except that in case lost calls are not instantly wiped out, the 
number of busy sources may exceed the total number of channels. 
When this latter situation arises, all channels are busy. 

This assumption is violated whenever the group of channels goes 
in only one of a number of different directions to which the sources 
have access; for obviously a source may be sending a call in one of 
these other directions. It is less often false in other engineering fields 
than in telephony. 


§ 115. Some Problems of Lost Traffic 


In order to illustrate the general principles involved we 
shall develop six formule for the probability of a call being 
lost. They illustrate well the extent to which shades of 
meaning must be carefully considered in dealing with such 
problems, arising as they do, on the one hand, from three 


slightly different assumptions as to how the traffic originates, 


and on the other hand from two as to what happens when a 
call is lost. ‘The three which deal with the origination of calls 
ate: o . 


Assumption 7 — The probability that a particular source will 
originate a call during a given time interval is the same for every 
interval at the beginning of which it is idle. It is not in any way 


a i 


§ 115. PROBLEMS OF LOST TRAFFIC RED) 


influenced by the condition of its group of channels. (Alternative 
to Assumptions 8 and 9.) 


Assumption 8 — The calls which are assigned to the group 
of channels are distributed individually and collectively at ran- 
dom.' That is, the chance of the group being assigned a call 
during a test interval is independent of the state of either group. 
(Alternative to Assumptions 7 and 9.) 


The difference between these assumptions may be illustrated as 
follows: If a group of ten channels is accessible to fifteen subscribers 
and to them only, it seems scarcely reasonable to assert that the 
chance of a call being originated within a second is the same if all 
trunks are busy at the beginning of the second as it is if all trunks 
are idle; for when all trunks are idle there are three times as many 
possible sources as when all are busy. To make such an assertion 
implies that individual subscribers are more likely to call when the 
group is busy than when it is idle: and this in turn implies fore- 
knowledge on their part. However, it is this assertion that is con- 
tained in Assumption 8. Unreasonable as it appears from this 
extreme illustration it will be found that it is often very near the 
fact when the sources of calls are not the subscribers themselves, 
but interoffice trunks and the like.” 

A more reasonable assertion in case the subscriber is the source 
would be that the chance of a call originating during a short test 
interval is proportional to the number of idle subscribers; that is, 
in the case of the above illustration, that it is three times as great 
when all channels are idle as when all are busy. This is the condition 
implied by Assumption 7. It also is sometimes very near the truth, 
even when the sources are not subscribers’ lines. 


Assumption 9— The probability of a call being assigned to 
the group of channels by some one of its sources is independent of 
the condition of either group, unless all sources are busy, in 
which case it is zero. In other words: calls are distributed indi- 
vidually and collectively at random, except that none 1s made 
when all channels are busy. (Alternative to Assumption 


7 and 8.) 


1In this connection, see the footnote accompanying the definition of “ collectively 


at random ”’ in § 84. 

21t should also be noted that Assumption 8 implies, either that the chance of 
all sources being simultaneously busy is zero, or that Assumption 4 is violated. Any 
procedure which violates the latter assumption will be carefully avoided. 


328 PROBABILITY AND ITS ENGINEERING USES 


The discrepancies between the results given by 7 and 8 are 
frequently large enough for practical traffic densities that the use 
of the wrong formula would result either in inadequate or in extrava- 
gant installation. The results of 8 and g are generally so nearly 
alike as to be interchangeable in practice. The principal difference 
is, that while the use of 8 may, in extreme cases, require more channels 
than sources, 9 does not fall into this difficulty. Such a result is so 
absurd, however, that no one would put faith in it; so that the 
advantage which 9 appears to have in this respect is of doubtful 
value. 


The assumptions dealing with what happens to lost traffic 
ares 


Assumption 1o—TIf a call is lost because no channel is 
available, the source which made it nevertheless continues to 
demand service. If during this time a channel becomes available 
it will be seized and rendered unavailable for others for the entire 
period that would have been required for the call if it had been 
successful, though the call will still be regarded as lost. (Alterna- 
tive to Assumption 11.) 


Assumption 11 —If a call is lost by virtue of insufficient 
equipment its holding time is zero. (Alternative to Assump- 
tion I0.) 


Assumption 10, of course, does not correspond to what actually 
takes place in telephone practice. If a call is unsuccessful, especially 
if the subscriber is informed of this fact, he is more likely to hang 
up his receiver quickly than otherwise. But there are problems to 
which Assumption 10 appears rigorously applicable. I believe 
certain types of fire-alarm apparatus are so designed that the sending 
mechanism continues to attempt to send an alarm for a fixed time, 
whether or not the alarm circuit is already in use. 

In order to avoid a considerable amount of circumlocution these 
two conditions will be spoken of briefly as “lost calls held”? and 
“lost calls cleared,” and each will be considered under a separate 
topical heading.! 


ee ee ee SS eS ee ee ee ee ee 


1** Lost calls held” must not be confused with “delayed calls”; for the latter 
stand by until served and then consume a time equal to their holding time — as would 


be the case if the fire-alarm mechanism were restrained from starting until its circuit 
was free. 


§ 116. THE ELEMENTARY PROBABILITIES 329 


§ 116. The Elementary Probabilities; Lost Calls Held 


The two events of prime importance in a telephone system 
are the inception of a call and its termination. If the prob- 
ability of the occurrence of each of these events is known 
under all circumstances, it should be possible to determine 
exactly how much equipment is needed. Hence the attack 
will be begun by evaluating them in accordance with the 
assumptions given above, using, to begin with, 7 and to as the 
particular pair of alternatives. 

Suppose a source is tested at a certain instant and observed 
for a short time, dt, thereafter. At the moment when it is 
first tested it must be either idle or busy. The probability 
that it is busy has already been denoted by p; the probability 
that it is idle must therefore be 1 — p. If the source is busy 
at the beginning of this interval, the only way in which it can 
originate a call is for the subscriber to close the call in progress 
and start another one during the interval. The chance of 
this happening can be made negligibly small by choosing a 
sufficiently short time interval d¢. Then it will be true that 
if the source is busy at the beginning of df, it cannot possibly 
originate a call before df ends. 

The chance that a source, idle at the beginning of dt, be- 
comes busy before its end is denoted by pi(4), and the chance 
of the source originating a call during the time interval df, 
assuming that nothing is known about its condition at the 
beginning of that interval by p(4). Then by the rule for 
alternative compound probabilities the relationship 

pd) = (1 — p) pi) + pp) (145) 
may be written down at once. In words this relation expresses 
the logical proposition, that the chance of a source becoming 
busy during a random time interval must be equal to the 
chance that it is idle and being idle becomes busy, plus the 
chance that it is busy and being busy becomes busy again. 
The latter of these two chances is of course zero to the first 
order of the small quantity df, and in accordance with the 
results of § 85 the random chance of a source originating a 


330 PROBABILITY AND ITS ENGINEERING USES 


call is obviously p(4) = n dt, to the same degree of approxima- 
tion. Hence, inserting this value in equation (145), the 
chance of an idle line becoming busy is found to be 


n at 
I—-p 
This is one of the elementary probabilities. 


If the chance of a source becoming id/e during df is con- 
sidered, the logical proposition 


pli) = (1 — p)-pilt) + p- pol) (147) 


is obtained, which expresses the fact that if a source becomes 
idle, it must either be idle and_become idle again (the chance 
of which is negligibly small) or else it must be busy and become 
idle. However, the source obviously becomes idle just as 
often as it becomes busy, and if nothing whatever is known 
about the condition of the line at the beginning of the time 
interval dt, the chance of it becoming idle during that interval 
must be the same as the chance of it becoming busy. In 
other words, p(i) = p(4) = ndt. Inserting this value in (147) 
it is found that 


pild) = (146) 


BO C2 (248) 
S 4 

If, instead of one source only, the entire group of \ sources 

are examined and / of them are found busy at the beginning of 

the interval, the chance that some one of these 7 busy sources 

will become idle during df is just 7 times as great as the chance 

for one individual source. Also, if Assumption 7 is adopted 

the chance that one of the \ — 7 idle sources will become Susy, 

is Just \ — 7 times as great as the chance for one individual 
source. These latter probabilities are therefore ! 


juat 
de (149) 


There are two statements of a negative sort which it is worth while making with 
respect to these elementary probabilities. In the first place, passing from (146) to 
(150) does not imply that the calling rate n is the same for all sources. It is quite true 


§ 117. STATISTICAL EQUILIBRIUM 331 


and 

(XX — jf) n dt 

a ee a 

respectively. In both these formule p represents the chance 
that an instant of observation, chosen at random, finds the 
line busy. It is obviously equal to the proportion of the hour 
during which the line may be expected to be busy; that is, 
coat. (See §$ 45=57,) 


; (150) 


§ 117. Introduction of the Assumption of Statistical Equilibrium; 
Lost Calls Held 


It is now possible to introduce the condition stated in 
Assumption 1; that is, to assert that the prodadility of the 
system being in any specified condition is the same at the 
end of the time interval dt as at its beginning. 

Consider first a time at which all sources are idle. The 
chance of this condition existing is very small if the group has 
anything like the total amount of traffic which it can safely 
handle. Nevertheless, the condition might occur and therefore 
has some finite probability. This may be denoted by *P’(o), 
the prime signifying the condition of lost calls held and the 
\ and o referring to the total number of sources in the group 
and to the number which are busy. Since there are-) idle 
sources the chance of some one of them becoming busy during 


that if the 7 sources which are busy happen to be those which call with the least fre- 
quency, the chance of a call originating during the time df is greater than that given 
by (150); while if they happen to be those which call with the greatest frequency 
the reverse is the case. To mention special cases such as these, however, implies special 
knowledge regarding the particular sources which are busy, and this, of course, is not 
allowable. If the properly weighted average of these probabilities is formed it is 
found to be identical with (150) above. 

In the second place it is nowhere assumed that all calls are of the same length. Nhat ts 
assumed is what is stated in Assumption 3 — that p is the same for every source. 
Although the method of derivation which has been used is entirely free from the 
implication of equal holding times, the assumption of equality has been so frequently 
employed by other writers, even when their results could just as well have been obtained 
without it, that it seems well to point out that it is not here involved, 


332. PROBABILITY AND ITS ENGINEERING USES 


a time dt is dv dt/(1 — p), and the probability that mone will 
become busy during this length of time is 
7 dt 
I-p 
Hence the chance that all the sources are idle both at the 
beginning and at the end of the interval is 
(: a ae »P'(0); 
See 
This is not, however, the total probability that none of the 
sources is busy at the end of this interval, for it might happen 
that at the beginning of the interval one source was busy, and 
that during the interval this source became idle. Denoting 
the probability that exactly one source is busy by *P’(1), and 
noting that the probability of this source becoming idle is, by 
formula (149), dt/T, it is easily seen that the probability of it 
being busy at the beginning of the interval and idle at the 
end is 


2 roa ie 

There are other things which might conceivably happen 
during dt which would leave the entire group of sources idle 
when this interval closes. For example, two sources might 
be busy and both of them become idle. But if the chance 
of a particular source becoming idle is d¢/T, the chance that 
both of two busy sources will become idle is the square of this, 
and is therefore of the second order in the very small quantity 
dt. In fact, a little consideration serves to show that the 
probability of any one change taking place in the condition 
of the set of sources is of the first order in dt; the probability 
of any two changes is of the second order; and so on. Since 
quantities of the second or higher orders in df are so small as 
to be negligible, it follows that there is no need of considering 
the possibility of more than one such change taking place. 
Hence, to the first order of small quantities, the probability 
of all the sources of the system being idle at the end of the 


——— 


$ 117. STATISTICAL EQUILIBRIUM 333 


interval dt is the sum of the probability that all were idle at 
its beginning and remained 80, and of the probability that one 
only was busy at the start and that this one became idle. 
These quantities having already been found, the probability 
of all sources being idle at the end of df is easily seen to be 


MEE Ns, Ge, 
(: nae PO) + iF Cr). 

To accord with the assumption of statistical equilibrium 
this must be the same as *P’(o). Forming the equation to 
which this fact leads and making a few obvious cancellations, 
it is found that 


ee ‘P'(0) = 7 *P'(2). (151) 
A similar argument may be applied to the case where every 
source is busy. The probability that some one of the sources 
will become idle being \ dt/T, it follows that the probability 
of none of them becoming idle is 1 — Xdt/T. If, therefore, 
the probability that \ sources are busy at the beginning of 
the interval is denoted by *P’(a), the chance that they will be 
busy both at the beginning and at the end of the interval is 


(: = 4 Sn ELON & (152) 


There is only one other way in which 2 sources of the system 
may be busy at the end of the interval without more than 
one event taking place in the meantime. This occurs in case 
} — 1 sources are busy at the start and the remaining one 
becomes busy before the interval ends. The probability of 
this is the product of the probability *P’(. — 1) that exactly 
} — 1 sources are busy at the beginning of the interval, and 
the probability ” dt/(1 — p) that the remaining one becomes 
busy before it ends. Adding this product to (152), and 
remembering that the result must be *P’(d) the equation 


nN ADI es: 
Sore ee ad, P’(n). (153) 


is obtained. 


334 PROBABILITY AND ITS ENGINEERING USES 


In general, the condition of 7 and only 7 sources busy at 
the end of the interval may occur in either one of three ways: 

(2) By exactly 7 being busy at the beginning of the interval 
and no calls being originated or discontinued during it, the 
probability of which is 


Rha roek )rPas 
(: ho. pet Hi ET 


(6) By exactly 7 — 1 being busy at the beginning of the 
interval and one new call originating, the probability of which 
1s 

eee n dt *P'(j — 1); 

(c) By exactly 7+ 1 being in progress at the beginning 
of the interval and one being SON the probability of 
which is 


Lo at PZ +1). 
By summing these three terms to get the complete probability 
of exactly 7 busy sources at the end of the interval, and setting 
this probability equal to *P’(7), an equation is obtained which 
may easily be reduced to the form 


(itt )ra-9- Gabe s sro 


73 (+) 2P'G 4 1) Oe ts) 


§ 118. The Probability Formule Corresponding to Assumptions 
7 and IO 


If the equations (151), (153), and (154) were linearly 
independent they would be sufficient to determine the value 
of the probabilities *P'(j) for each 7 fromo to 2; for there are 
exactly 4+ 1 of these probabilities and tier are exactly 
A + I equations corresponding to them. It so happens, how- 


ever, that they are not linearly independent and an additional , 


er Me 


§118. FORMULA FOR PROBABILITY OF LOSS 335 


equation is necessary to solve the problem. This is readily 
obtained by remembering that 


x 
Sager (155) 

Solving these equations, which may be done by the theory 

of determinants,! it is found that 
ee eC) edb = phe. (156) 

This equation gives with absolute accuracy the probability 
of exactly 7 busy sources out of a total of \ at an arbitrary 
instant when a test is made, provided Assumptions 7 and 10 
are satisfied. It is, therefore, the probability that exactly 
j subscribers will wish to use the group of channels simul- 
taneously. 

The formula itself is the usual Binomial Law for the 
probability of an event happening 7 times in independent 
trials, if the probability of success in a single trial is p. The 
problem could have been so phrased that the answer would 
have been apparent at once: the longer method was adopted 
because it emphasizes the underlying hypotheses, and leaves 
no doubt as to the exact meaning of the answer when obtained. 

To find the probability of a call being lost we make use of 
the following argument: If we choose an interval at random, 
and observe the system during this interval, a call may or may 
not occur. If it does, it may or may not be lost; but since 
the interval has been chosen at random, without regard for 
the state of the system, “the probability that it is lost if it 
occurs ”’ is just exactly the thing that we mean by the words 
“ the probability of loss.” 

In the form in which we have stated it, however, this is 
a conditional probability, and to it the formula (20) may be 
applied at once if we let the symbol 4B mean “a call occurs 


1 An alternative method of solution is: to find *P’(r1) in terms of *P/(o) from (151); 
then by writing j = 1 in (154) to find \P'(2) in-terms of *P/(o); next by writing j = 2 
in (154) to find \P'(3) in terms of XP’(o); and so on. After every XP’(j) has been 
so expressed, *P’(o) may be found from (155). 


336 PROBABILITY AND ITS ENGINEERING USES 


and is lost,” the symbol, 4, “a call occurs,” and the symbol 
‘Bi atts lost « 

As for the chance of a call occurring and being lost — that 
is, P(4B) — that is just 


ae ee 
Q =f) n dt = Mi dt CP pt =p) ae 
j= 


v 


r 
2 PG) 


qe Leap 


for if more than y sources are busy during our interval dt and a 
call occurs it will of necessity be lost. And as for the chance 
of a call occurring, — that is, PU4) — that is Just A” dt. So 
substituting these. values in (20) and making certain simple 
rearrangements we get for P(B), or ‘7, the form 


2 OS iees f 
MT, = e ; ) oy (=) (158) 


j= 


where «, or \p, is the expected traffic density in the group. 


§ 119. The Probability Formule Corresponding to Assumptions 
S and 10 


Formule (156) and (158) are more complicated than is 
necessary for many purposes, for it frequently happens that the 
traffic arises from a very large number of sources, each one of 
which is busy but a small fraction of the time. In such cases 
the number of idle sources — and therefore the chance of a 
new call — is substantially the same at every instant, and we 
should expect to be able to find a suitable formula ‘in a simpler 
form. Indeed we find, by allowing the number of sources 
to increase indefinitely without changing either ¢ or v, that 
(158) approaches the simpler expression 


(159) 


This formula is much used in computing trunk groups. 
The corresponding formula for the probability of-exactly 


§119. FORMULA FOR PROBABILITY OF LOSS 339 


j busy sources is obtained by taking the limit of (1 56), the 
result being, as we have seen in § 83, 


eg 
! 


apy o 
(7) a 


(160) 


which is just the familiar Poisson Formula. 

The manner in which (159) has been derived suggests that 
it is only accurate when the number of sources greatly exceeds 
the number of channels to which they have access. As a 
matter of fact, this is true if each source is independent of the 
rest as required by Assumption 7. However, the usefulness 
of (159) is actually much broader than this statement would 
imply, as can be shown by placing it upon a slightly different 
foundation, as follows: 

Since ¢ = \7T = 2p, it follows that as X is increased 
indefinitely, 27 and p must each decrease according to the law 
nT =p= e«/r. Inserting this in (150) we find that the 
chance of a call being originated when / sources are busy is 


(-9-49 
» Te 


a quantity which approaches the limit ¢dt/T as } increases 
indefinitely. Since this limit is independent of 7 it follows 
that formula (159) corresponds to Assumption 8, that is, to the 
case where the calls are distributed individually and collectively 
at random. : 


From a practical standpoint Assumption 8 is inconsistent with a 
limited number of sources, for in practice it must always be true 
that the chance of a new call being originated when all the sources 
are busy is zero; and it is therefore dependent on the number of busy 
sources to just that extent. 

This practical difficulty is reflected in the theory in the sense 
that the combination of Assumptions 8 and Io is inconsistent with 
Assumption 4 unless the number of sources is infinite. For the pur- 
pose of this paragraph, therefore, Assumption 4 may be regarded as 
ignored. The same difficulty will not arise when Assumption 8 1s 


338 PROBABILITY AND ITS ENGINEERING USES 


combined with Assumption 11 unless the number of channels is at 
least as great as the number of sources. 

Practically speaking, these difficulties in harmonizing our assump- 
tions are unimportant unless the chance of all sources being busy 
simultaneously is quite large. Moreover, it is actually true that 
(159) and (160) are extremely valuable in many cases where the 
number of sources exceeds the number of channels by a sufficiently 
wide margin. 


# 
§ 120. The Probability Formule Corresponding to Assumptions 
9 and IO 


The practical analogue of Assumption 8 in the case of a 
limited number of sources is Assumption g, which states that 
so long as any sources are idle the chance of a new call being 
originated is independent of their number, but that as soon 
as all sources become busy the chance of a new call being 
originated drops to zero. If the method of computation 
which has been used in obtaining formula (158) is applied to 
this set of assumptions the results 


! - 

Pj) = = (161) 

sno fi! 
and 

Ss é 
: Swe 

= 3 - (162) . 
ino f! 


are obtained. 

It can be shown that in most instances these formulz give approx- 
imately the same values as those obtained from (159) and (160). 
Practical conditions usually require that the probability of loss shall 
be small. This means, of course, that the terms in the numerator of 
(162) must be small, and it is easy to see that if this is true the dif- 
ference between the denominator and a similar expression summed 
from zero to infinity is negligibly small. The latter expression, how- 
ever, is the series expansion for e. If this approximation is sub- 


§ 121, THE ELEMENTARY PROBABILITIES 339 


re) 


stituted for the denominator, (161) immediately becomes identical 
with (160). 

Likewise if it is true-that \ is much larger than y the difference 
between the numerator of (162) and a similar expression summed 
from v to infinity will be negligible and (162) will reduce to (159). 
In other words, (161) is always sensibly equal to (160) under practical 
conditions, and (162) is approximately equal to (159) except when 
the number of sources is very nearly the same as the number of 
channels. These qualitative assertions will be given quantitative 
illustration in §§ 126 and 127. 

The one vital difference between (162) and (159) is that (159) 
does not depend upon } at all and therefore gives a finite probability 
of loss even when the number of channels exceeds the number of 
sources — an absurd result to which (162) does not lead. 

This absurdity is merely the practical manifestation of the remark 
made in §119: that Assumption 8 is not strictly tenable in any 
case where the number of sources is limited. 

The present section and the two which precede it contain 
formule corresponding to the conditions of lost calls held, both 
when the sources of calls are assumed to be independent and 
when they are assumed to be dependent upon one another in 
such a way that the chance of a call originating is influenced 
by the number of busy sources. It is necessary next to obtain 
analogous results for the condition of lost calls cleared. 


§ 121. The Elementary Probabilities; Lost Calls Cleared 


A careful consideration of the derivation of (146) shows 
that the form of this equation is not affected by shifting to the 
assumption of lost calls cleared. The value of p, however, is 
somewhat altered, due to the fact that unsuccessful calls con- 
tribute nothing to the busy time of the sources. Hence instead 
of p = nT we now have p = (1 — I)aT. 

In the development of the second elementary probability, 
the logical proposition stated in (147) is no longer true, since 
it can no longer be asserted that the probability of an idle 
source becoming busy and idle again during the interval dt 
is negligible; for if such an idle source were to originate a call at 
a time when there was no available channel to receive it, it 
would instantly become idle again. Thus, in effect, an idle 


340 PROBABILITY AND ITS ENGINEERING USES 


source becomes idle, and p;(i) is not zero. Instead, pi(z) is 
now equal to the probability that all channels are occupied 
during dt, and that our source, which is idle, originates a call. 
We find at once 


n at 
pil) =I aap ° 
Inserting this in (147), we get 
; — Il) # at 
pli) = Beg 


and then remembering that p is now (1 — Il) ”T, we are again 
led to the same formula dt/T as before. 


It is obvious from a common-sense standpoint that the progress 
of a successful call, after its connection is established, should be in 
no way influenced by unsuccessful calls. In particular, the prob- 
ability of termination and the holding time should be unaltered, 
whatever becomes of the unsuccessful calls. It would seem apparent, 
therefore, that if the chance of a busy line becoming idle is expressed 
in terms of dt and T only, the formula which results must be valid 
either for lost calls held or for lost calls cleared. This would estab- 
lish the validity of (148) even if the method by which it was originally 
derived had introduced Assumption 9. 


§ 122. The Probability Formule Corresponding to Assumptions 
7 and II 


Having seen that both elementary probabilities are expres- 
sible in the same form as in the preceding case, it may be 
inferred at once that the form of equations (151) and (154) 
remains unchanged. This inference is borne out by an inde- 
pendent investigation. There is this difference in the cir- 
cumstances, however: that, whereas in the preceding case the 
maximum number of sources which might be simultaneously 
busy was }, in the present instance the number is ». Equations 
(153) and (155) must therefore be reconsidered. | 

Suppose the probability of exactly » channels being simul- 
taneously busy is *P’’(v), the \ and » having the same signifi- 
cance as before and the double prime relating to the condition 


$199. FORMULA FOR PROBABILITY OF LOSS 341 
of lost calls cleared. If the system is to be in this condition 
at the end of a short interval dt, it may either have been so at 
the beginning of the interval and remained unchanged, or else 
it may have had just one idle channel at the beginning of the 
interval, this one becoming busy meanwhile. Taking both 
of these possibilities into account and introducing the principle 
of statistical equilibrium in exactly the same fashion as has 
been done above, it may be easily seen that (153) must be 
replaced by 
A—v+tI 
VP 
Similarly the sum of the probabilities of each number of 
busy sources from o to v must be equal to unity, since it is 
impossible for this number to exceed X. This gives 


BP") = n *P"(y — 1). 


> *P''C7) = i 
j=0 


which takes the place of (155). 
Having thus obtained the necessary independent equations, 
their solution can be carried out very easily by the use of 
determinants. The result is 
Ie 
se) 
j aes ? 


Mp" j) = = ie (163) 
TO ( nT ) 
ek 
It is desirable to express this formula in terms of the traffic 
density of the group. This traffic density is, as before, 
e = nT; whence 


J=0 


a (1 — Ile 
= ae 


Hence in terms of «, (163) becomes 


od AGB = i aa (164) 
(5 € 


342 PROBABILITY AND ITS ENGINEERING USES 


The probability of loss is obtained by the same argument 
as in §118. The chance of a call being originated during a 
short interval of observation is \ 7 dt as before. The chance 
of a call being lost, however, is much simpler, since it is now 
impossible for more than » sources to be busy simultaneously. 
It is 


»P'"(y) ae n dt. 


The ratio of these two quantities is the probability of loss 
It is easily seen to be 


chet ae ) 
vie € 2% ) pe syitse (; 2) 


IN Viet sala ea 
. eee. 


j=0 


A less obvious form, but one which is more convenient fo: 
computation is 


= Bins (eee : ¥ (165) 


Mktg Pl) se) 


§ 123. The Probability Formule Corresponding to Assumptions 
8 and rr 


Formule (164) and (165) are analogous to (156) and (158). 
From them others may be derived which are appropriate when 
the sources are independent and their number greatly exceeds 
the number of channels, or when the sources, though not very 
numerous, are so related that as more and more of them become 
_ busy the individual calling rates of those which remain idle 
increase at a rate which just neutralizes their decrease in 
number. This is done by taking the limits of (164) and (16s) 
as becomes infinite, just as was done in § 119. In this way 


§124. FORMULA FOR PROBABILITY OF LOSS we 


formule analogous to (159) and (160) are obtained. They 
are 
a 


€ 


| 
IPA (166) 
a 
1=0 | 
and 
AF =! 
Ueno (167) 
x 
j=0 J+ 


As has been said in connection with formule (159) and (160), 
Assumption 8 is not tenable if it is possible for all sources to be 
busy simultaneously, for in this case there are no idle sources left to 
originate calls. This manifests itself in the fact that (167), like 
(159), gives a finite probability of loss, even when X < ». For 
this reason, Assumption 4 must be ignored in developing (167), if 
ee Be 


§ 124. The Probability Formule Corresponding to Assumptions 
9 and II 


No change is made in either (166) or (167) when Assumption 
8 is replaced by Assumption g, unless \ S$ ». However, if 
\ < » they become, respectively, 


J 


€ 
eae 
Pj) = 25; (168) 
iT 
j=0 J: 
and 
0, = 0. (169) 


These formule are identical with (161) and (162).!. That this 


1 When X < » the upper index of summation in the numerator of (162) is less than 
the lower index, so that the formula is meaningless as written in § 120. Its true value, 
however, is zero, as listed in Table XXXII. 


344 PROBABILITY AND ITS ENGINEERING USES 


TABLE XXXI 


ScHEMATIC REPRESENTATION OF NOTATION 


Assumptions as to Treatment of Lost Calls 
Assumptions as to Origination of Calls 
Lost Calls Cleared Lost Calls Held 
(Assumption 11) (Assumption 10) 
I Il 
Sources Independent 
(Assumption 7) AP'(j) PS is AP'(j) Hae 
Calls occur individually and collectively Ill IV 
at random 2 Ronni mba © mes 
(Assumption 8) EG) II, PIG) a 
Calls occur individually and collectively at Vv VI 
random, unless all sources are busy; 
then none occur j a ~ , 
(Assumption 9) PAG) II, P'(j) a8 


Note. — The Roman numerals are for the purpose of identification in connection 
with the curves which follow. 

Formule I, II, III and IV are known, respectively, by the names Engset, Binomial, 
Erlang and Poisson. 


is to be expected is obvious, since when no calls are lost, what 
happens to lost calls is immaterial. 


§ 125. Recapitulation of Formule 


Formule have now been obtained corresponding to each 
of the six alternative pairs of assumptions, and it is desirable 


to collect them together for purposes of reference. This is — 


done in Tables XX XI to XXXIII. 

Table XX XI represents schematically the relationship of the 
various formule to the sets of assumptions upon which they 
are based; while Tables XXXII and XXXIII give the formule 
themselves. 

In developing these probabilities in the preceding pages 
there has been no need to write down the circumstances under 


} 
} 


§ 125. RECAPITULATION OF FORMULA 345 


TABLE XXXII 


PROBABILITY oF Loss 


Assumptions ! Formule Reference 
Number 
MT =0 A<»p 
ay ( ee 
and II i 4 A— (1 — TD) 
: MI; = : “A A> (165) 
Le. 4 |= 
Cer es k= = De 
7s 1 AX\Sv 
7 and Io y —¢\Ar} AHI j (158) 
“ee ) z ori{} A>» 
» j=v Ae 
e 
8 and 11 ar 
(4 ee il? t= = i neo (167) 
ASv a 
j=0 J! 
8 and Io eo 
(4 ignored ifA| “WU, =e-£D — NEKO (1 
g 1 59 
is finite) Per" 
Il, =O Assy 
9 and 17 ‘ ; (169) 
H,= “0, AAY 
I, =o AS» 
: A-1 J 
9 and Io pe. ae oF (162) 
Tl, = — f Xr = v 
es 
j=0 J! 


1 Unless explicitly stated all assumptions from 1 to 6 are used. 


which they take the value zero. This was evident from the 
context. In the tables, however, these limiting conditions are 
given in order that no ambiguity may be involved. 


346 PROBABILITY AND ITS ENGINEERING USES 


§ 126. Numerical Comparison of Formule; The Dependence 
of the Probability of Loss upon the Number of Sources 
when the Traffic Density of the Group is Held Constant 


In order to gain some conception of the magnitude of the 
differences between these various formule, it is desirable to 
present a few numerical examples which illustrate the essential 
points of their behavior. 

In the first place, the extent to which they depend upon 


Probability of Loss, 11 


20 40 60 80 100 
Number of Sources, 2 


Fic. 39.—Comparison or Various FormuLt# For Propasitiry or Loss wHEN 
THE Trarric Densiry 1s Herp Constant. 


the number of sources may be considered. To illustrate this 
point a group of ten channels! is chosen and it is assumed 
that this group receives its traffic, sometimes from a few busy 
sources, sometimes from many relatively idle ones, but always 


1 Attention should be called to the fact that in this and the following sections 
the particular numerical values chosen are such as to give quite appreciable differences 
between the various formule. It would be a mistake to infer that the differences are 
always of this order of magnitude. As a matter of fact they may be either larger 
or smaller. It may be stated as a rough general rule — though this rule like most 
others has its exceptions — that where the groups of channels are large, the gilefences 
will be smaller than those here obtained, and vice versa, 


§ 126. NUMERICAL COMPARISON OF FORMULAE 347 


TABLE XXXIII 


PROBABILITY OF ConcEstTIon j 


Assumptions ! 


7 and II 


7 and Io 


8 and 11 
(4 ignored if < v) 


8 and Io 
(4 ignored if 
is finite) 


Q and 11 


9 and I0 


1Unless explicitly stated all assumptions from 1 to 6 are used. 


Formule Reference 
Number 
Clete 
a ea (Clie 
ne () = v e j PSE 
DC, oe -) (164) 
j=0 ies a — Tl) 
AP'(j) AS j a 
A— «\* € j 
Apr ‘) = C> . = 
(J) ( d ) Jj (; i ) J=oa (156) 
@ 
arn e 7! 
pe (166) 
7a0) J 
” P''(j) a j Se 
ra) “J 
P'(j) eae j2o (160) 
Pm) = PY) ue 
(168) 
Pitj) = * Pj) NZ» 
Pl 
é jl! 
Pi) = hia 
A (161) 
J=0 J: 
Pj) =0 gah 


in such a way that the traffic density is 4. The results plotted 

against A, the number of sources, are shown in Fig. 39, each 

curve corresponding to one of the formule of Table XXXII. 
The two formule corresponding to the assumption of inde- 


348 PROBABILITY AND ITS ENGINEERING USES 


pendent sources (the top row in the scheme of Table XXXI) 
give the Curves I and II. These coincide with the d-axis so 
long as \ Sv, and rise gradually as \ increases beyond this 
value. 

The formule corresponding to Assumption 8 give hori- 
zontal lines, as is to be expected from the fact that the 
assumption implies independence of 4. They are designated 
III and IV. That they are asymptotic to the Curves I and II 
is evident from the fact that (167) and (159) were obtained as 
limiting cases of (165) and (158). They do not approach the 
\-axis even when  < », a fact which is merely the graphical 
equivalent of the statement already made— that they give 
a finite probability of loss even when the number of channels 
exceeds the number of sources. 

Curves V and VI, which correspond to the third row in 
Table XXXI, occupy an intermediate position. They coincide 
with the d-axis for \ < » and thus avoid the absurd results to 
which formule (167) and (159) give rise. Indeed, the purpose 
of the modification of Assumption 8 contained in Assumption g 
was exactly to avoid this absurdity. 

For all values of )} which exceed », Curve V coincides with 
Curve III. That this is as it should be is seen from the fact 
that when 2 exceeds » it is not possible for all the sources to be 
simultaneously busy, and hence the modification of Assumption 
8 plays no part whatever. Curve VI, on the other hand, while 
it rises more steeply than II, does not jump abruptly from zero 
to its maximum value but approaches the latter asymptotically. 
The reason for this lies in the fact that if lost calls are held, 
there can be more than » in progress atone time. In this case, 
more than one must clear before a successful call can be made. 
Thus, those calls which fail still produce a “‘ hang-over ” effect, 
which interferes with the chance of success of other calls. 
This “ hang-over ”’ becomes greater and greater as the number 
of sources is increased. Indeed, it is this effect which is 
responsible for the fact that all of the curves corresponding 
to the first column of Table XX XI show greater probabilities 
of loss than do the analogous curves of the second column, 


§127. NUMERICAL COMPARISON OF FORMULE 349 


It may seem surprising at first thought that the “ hang-over ” 
effect should ever produce.an increase in the number of lost calls 
as great as that whichis necessary to account for the difference 
between Curves III and IV. In fact, Curve III says that, if lost calls 
are cleared, only about one-half of one per cent of the calls are lost, 
and Curve IV says that this small proportion, if held instead of 
cleared, is capable of increasing the proportion of loss by about 
50 per cent. It should be remembered in this connection, however, 
that the very fact that calls are lost implies that they are originated 
at a time when the system is already congested. Therefore, unless 
they are quickly disposed of, a very few of them may lengthen the 
period of congestion to a considerable extent and increase the pro- 
portion of loss correspondingly. 

That there is no “ hang-over”’ effect when \ exceeds » by 1 is 
evident from a common-sense point of view. Hence the modified 
Assumption 8 should give exactly the same results regardless of 
whether lost calls are held or cleared. In other words, Curves III, 
V and VI should all cross at the value’ = 11. That they do so is 
evident from the figure, as well as from the fact that in this case 
(167), (169) and (162) are all identical. 


§ 127. Numerical Comparison of Formule; The Dependence 
of the Allowable Traffic Density upon the Number of 
Sources, when the Proportion of Loss is Fixed 


The curves of Fig. 39 show very satisfactorily the essential 
differences between the results to which our various combina- 
tions of assumptions lead. They are open to the objection, 
however, that they give an exaggerated idea of the practical 
importance of these differences. Ordinarily probability for- 
mule are not used, as is done here, to compute the proportion 
of loss which corresponds to a preassigned amount of traffic, 
but for. the converse purpose of computing the maximum 
allowable traffic density when the allowable proportion of loss 
is known. Since small changes in the traffic density produce 
large changes in the proportion of loss, formule which give 
widely different results when used in the former way may 
agree surprisingly well when used in the latter. 

In order that no such erroneous impressions may be pro- 
duced, curves showing the traffic density corresponding to a 


350 PROBABILITY AND ITS ENGINEERING USES 


loss of one call in one hundred are shown in Fig. 40. As 
before, a group of ten channels is considered and the number 
of sources is yaried through the range from o to 100. The 
different curves corre- 
spond to the six alter- 
native formule of 
Table XXXII. 
Formule (167) and 
(159) again lead to the 
straight lines III and 
IV, which extend un- 
broken even when. 
»< 10. Formula (169) 
leads to Curve V 
which coincides with 
Il] ‘when: <d > To. 
Similarly, (162) leads 


Allowable Traffic Density, € 


0 “20 40 60 80 0 to a curve which 
Nunber of Sows A “crosses Ail eand av aang 
Fic. 40.—Comparison oF Various FoRMULH FORTHE ) = J ie and for all 


ALLOWABLE TRAFFIC DENSITY WHEN THE PROBA- 


BILITY oF Loss 1s HELD Constant. subsequent values 


practically coincides 
with IV. From a practical standpoint these four curves are 
sufficiently nearly alike that any one of them might be used in 
place of any other. 

Curves I and II, however, which are obtained from formule 
(165) and (158) and therefore correspond to the first row of 
Table XXX], differ from the others by amounts which are of 
engineering importance. For instance, when the number of 
sources is 15 they allow these sources to originate about 20 
per cent more traffic than is allowable when the other formule 
are adopted. 

In other words, little change of practical consequence is 
introduced in our results by shifting from the assumption of 
lost calls held to the assumption of lost calls cleared; or by 
shifting from Assumption 8 to its modified form 9. The only 
difference which is of serious consequence comes from using, on / 


AP ee ae ee ee 


§ 128. COMPUTATION CHARTS 351 
the one hand, the assumption of independent sources (Assump- 
tion 7) and on the other the assumption that the chance of a new 
call being made during a short period of observation is not affected 
by the state of the system when that period of observation begins 
(Assumption 8 or 9). 


§ 128. Charts for Purposes of Computation 
Since all of the formule of Table XXXII group themselves 


into two similar classes in such a way that members of the 
same class give very similar results, while members of different 
classes do not agree so well, it will be sufficient for the further 
purposes of this study, as well as for most practical needs, to 
confine attention to a typical pair. For this purpose that pair 
is chosen which corresponds to the extreme conditions repre- 
sented by Curves I and IV in Figs. 39 and 4o. All the other 
formule give results which lie intermediate to these two but 
agree with the one or the other of them sufficiently well that 
no account need generally be taken of the differences. 

Fig. 41 is a working chart computed in accordance with 
equation (165). The entire figure corresponds to a loss of 
one call per thousand. Each curve corresponds to a group of 
channels, the size of which is indicated by the attached number. 
The numbers along the left-hand margin represent the number 
of sources, while the numbers at the bottom give values of 
3600¢ (that is, 7, the number of calls per hour, multiplied 
by T, the holding time iz seconds). 


As an illustration of the use of this chart, suppose it is desired to 
assign to a group of ten channels a group of sources, each of which 
originates on the average three calls of 100 seconds’ duration per 
hour. Then ~T = 300. Entering the chart, it is found from the 
curve for v = 10 that the ordinate corresponding to this value is 
\ = 41. Hence 41 sources may be assigned to the group of ten 
channels. 

As another illustration, suppose 200 sources are to be accommo- 
dated by switches capable of reaching ten trunks each. Suppose on 


1In the telephone industry, calling rate and holding time are usually stated in 
this way. 


“I00'O = SSO'T AO ALITIAVAOUT “WINWUOH LASONY FHL YOU LUVH) ONIAUO\A— IP “OILY 


LL 4% ‘aang sag spuosas-7409 


I 


— 
| 


a 


PROBABILITY AND ITS ENGINEERING USES 


357 


coo 
is 
EEERECE eee 


Poo 
a 


0- 


‘dnos) 40d saranog fo saqunay 


XN 


COMPUTATION CHARTS 


Se [LORGi == Sso'T cee) ALITIGAVAOUT “VTAWNUO LASON GHL Yor LUV) ONTAUO\A—'TP ‘OI 
= L& ‘aadnog aad spuosas-7702 
0002 00st, 0001 00S | 00+ 00E.__ 002 _ 00n ue 
I=4 
OL 
c | 
02 
ial v 
: : C 
ro 1? | 
L Ov 
im HH im +H gs EN : 
io jaa lia | 6 it 
[aw RE im owiie ys 
(Sti Ea 
09 
a = 
sii om a x 
025 3? 
1 =f + A 
ole im im ss 
Tt J 
(ial Ks : 
im 0c=* ns 
cot H co g 
33 = 06 2 
a PS 
So 001 » 


354 PROBABILITY AND ITS ENGINEERING USES 


the average these sources originate during the busy hour two calls of 
an average duration of 140 seconds, and that it is required to find how 
they shall be grouped. Multiplying the calling rate by the holding time 
gives the number 280. Entering the chart with this value it is found 
that each group of trunks is capable of accommodating 43 sources. 
Therefore 4 full groups of trunks are required. There then remain 
28 sources to be accommodated by the odd group. The point upon 
the chart which corresponds to \ = 28 and 3600 € = 280 lies between 
the curves marked 7 and 8. Hence the odd group will require 8 
channels to carry its trafic. The grouping of the sources will there- 
fore be 4 groups of 43 and 1 group of 28, and the channels required 
will be 4 groups of 10 and 1 group of 8, or a total of 48. 


Allowable Traffic Density, & 
{| 
PT 


=I 


Re 
Size of group, v 


Fic. 43.—Workine Cuarr For THE Poisson Formuta. 


Fig. 42 is a similar chart except that it corresponds to a 
probability of loss of one call per hundred. Its use is identical 
with that of Fig. 41. 

In Fig. 43 are given working curves corresponding to the 
formula (159). Their use is slightly different from that of 
the curves in Figs. 41 and 42. The size of the group of chan- 
nels is now represented-by the numbers along the horizontal 
axis instead of by those on the curves, while the curves them- 
selves correspond to a particular value of the probability of 


loss. Curves are given for Il = o.o1 and Il =o.cot. The 


= Fo 


— 


§ 128. COMPUTATION CHARTS 355 


vertical axis now represents the maximum allowable traffic 
density, from which the number of sources must be determined 
since the values of \ do not explicitly occur. 


The use of this chart may be illustrated by solving exactly the 
same problems as before. In the first case a group of 10 channels 
is available. Entering the chart, the allowable traffic density for 
such a group (for a loss of one call in a thousand) is found to be 
€ = 2.96. The number of sources which can be accommodated is 
the largest number the traffic from which does not exceed this den- 
sity. The traffic density for a single source is 7T = 0.0833. Hence 
the number of sources which can safely be accommodated is 
2.96/0.0833 = 35. This number corresponds to the 41 obtained 
from the use of Fig. 41. 


In the second illustration, where it is necessary to accommodate 
200 sources originating on the average two calls of 140 seconds 
holding time apiece, the average traffic density of a source is nT = 
0.0778. Since the traffic density of a group may be 2.96 we 
find that the number of sources which can be accommodated is 
2.96/0.0778 = 38. There are therefore required 5 full groups of 
to trunks each, together with an odd group sufficiently large to_ 
handle the traffic from the remaining Io sources. These 10 sources 
give rise to a traffic density amounting to 0.0778 X 10 = 0.778. 
Entering the chart with this value it is found that the number of 
trunks required for this odd group is 5. The total number of trunks 
is therefore 55. The difference between this result and that obtained 
from Fig. 41 is about 14 per cent. 


By means of charts such as these, computations can be 
carried out without the expenditure of an undue amount of 
time. That is, the complicated formule listed in Table XXXII 
can actually be reduced to a form in which their use for prac- 
tical purposes is feasible. Figs. 41, 42 and 43 do not cover a 
sufficient range of values to make them satisfactory for such 
purposes; and, indeed, it would be difficult to print in pages 
such as these, charts on a scale large enough to be of much 
practical value. It is evident, however, that constructing such 
charts is possible; and since they need be made but once, the 
fact that the computations involved in producing them are 
tedious is a matter of minor importance. 


356 PROBABILITY AND ITS ENGINEERING USES 


PROBLEMS 


1. In a certain boot-black “ parlor” of a railroad terminal, the 
expectation of the number of customers during the peak period is 
75 per hour. On the average a shine requires 4 minutes. Assume 
that a prospective customer, if the chairs are all full, merely looks 
in the door and walks away. Assume also that the customers 
arrive individually and collectively at random. What probability 
formula would you use to find the number of chairs required in order 
that the proportion of lost trade should not.exceed 0.01? 


2. Use the Poisson Formula as a means of obtaining a first approx- 
imation to the number of chairs required in the last example. Then 
find, by computation for neighboring values of v, the correct number. 


3. It is desired to measure the number of bursts of static per 
second by means of a recording chronograph. Assuming that the 
expected number per second is 7 = 1.7, and that the expected 
duration of each is T = 0.3 second, what proportion will not give 
distinct signals? ; 

(Assume that overlapping bursts will be recorded as one; also 
that the bursts occur individually and collectively at random.) 

4. The chronograph record consists of a number of separate 
entries of measurable duration. It is possible to read directly, 
therefore, the number of separate entries 7* and their average dura- 
tion T*. Due to overlapping, however, these are not equal to 7 and 
T. Develop formule for 7 and T in terms of ”* and T*. 


5. During an hour, such a chronograph record showed 18,241 
separate entries, the aggregate length of which was 2977 seconds. 
What figures do you deduce for the frequency and duration of the 
static pulses? 


§ 129. Some Hunting Problems 


We now turn to a different class of problems: those which 
concern the amount of traveling which the “‘ switch.” must do 
in steering a call along its way. Such problems are of impor- 
tance for two reasons: first, because the amount of wear to 
which a switch is subject ordinarily decreases with decreasing 
travel; second, because in many cases the time consumed in the 
hunting operation increases by just that much the period 
required to complete the connection. We choose again a, - 


§ 129. SOME HUNTING PROBLEMS 3547 


number of problems which typify, in the main, the methods of 
solution, without requiring an explanation of any of the 
technical details of telephony. 

In the first place, we must notice that the “ switches ” 
themselves may be of either of two types. Each switch may 
be permanently connected to a “ source,” its function being to 
select a channel for that source to use when the source needs 
it: Such switches are technically known as “selectors.” Or 


- it may be permanently connected to the channel and, when a 


channel is needed by one of its group of sources, it may go 
in search of the “ calling” source. Such switches are technic- 
ally known as “finders.” In either case, of course, the 
“group of switches” is synonymous with one of the groups 


Fic. 44. 


about which we have been speaking — that is, it is immaterial 
whether we speak of a “ group of sources”’ or of the “ group 
of selector switches’”’ to which those sources are connected; 
and the same is true of channéls and finder switches. In our 
present study it will be simpler to think in terms of the switches 
in each case. 

The second point which we must notice especially, is that 
though the switches are assumed to be identical and to reach 
the same group, they may not reach the various members of 
that group in the same order. To be explicit about this, we 
may think of a group of six selectors, represented schematic- 
ally by the arrows of Fig. 44; and we may suppose that they 
reach a group of three channels represented by the horizontal 
lines. So far our description applies equally well to either 
part (a) or part (4) of the figure. But all six switches in the 
arrangement (a) reach channel 1 first, channel 2 next, and 


358 PROBABILITY AND ITS ENGINEERING USES 
channel 3 last; while in arrangement (4) each channel appears 
in the lowest (“ preferential’) position before two switches, 
in the second-choice position before two others, and finally as 
last-choice positions for the remaining two. Condition (a) is 
known technically as a “‘ straight multiple” and condition (4) 
as a “‘ slipped multiple ’’.1 

The third point which we must notice concerns the behavior 
of the switch when it is released from service. Customarily 

op it does) oneof two-things: (2) (returns ton 
a “rest position,” so that all idle switches 
are lined up in a neat schematic row as 
shown in Fig. 44; or (4) stays where it is, so 
that idle switches may be most anywhere, 
as shown in Fig. 45. 

Finally, in the fourth place, the switches may hunt singly 
or in groups. When they hunt as a group, the switch which 
first succeeds in its search takes charge of the call and the rest 
stop hunting. 

Naturally, a difference in any of these three essentials 
may very profoundly affect the length of hunt; so separate 
consideration must, in general, be given to each possibility. 
We shall take up a number of cases in order. 


Fic. 45. 


§ 130. Individual Hunting from a Normal Position 


The simplest of all possible problems is that of a single 
finder hunting over a straight multiple. Obviously, how far 
ONE FINDER STaRrsenom 1¢: Must gou1s determined? solely (by the 
NORMAL REST POSITION. position of the source which wants it. 
ieee ee If there are v sources and all are equally 
likely to be busy, the expected number of terminals tested will 
be 


a 


pe (170) | 


‘Not all “slipped multiples” are arranged exactly as in (4), but we shall use the 
term for this simplest arrangement only. 


+ 156 eee 
7 + 


§ 130. INDIVIDUAL HUNTING 359 


and the probability of testing more than a specified number & 
will be 


p> kl) = 1 — = (171) 


If the multiple, instead of being “ straight,” is “ slipped,” 
the formule (170) and (171) are still unchanged. The principal 
Guterences between <this\ case “and thé 4 oe ivcnin scans pRoi 
former one lies in the fact that in this NormaL REST PosITION. 
case all sources get equivalent grades of *“1PPEP MULTIPLE. 
service, since all appear equally often in the favorable position, 
while in the former case those which appear nearest the rest 
position get a better grade of service than the others. The 
grade of service, averaged over all lines, is the same in both 
cases, however. 

For our second problem we may state the conditions as 
follows: 

Fach channel appears in the same position before every 
switch; all switches start from rest; if every channel is busy 
the switches do not repeat the test, but | 

. NE SELECTOR STARTS 
return to normal and discard the call. From norMAL REST PO- 
MitilereicseCircuimstuiices It tseevident ( “<!TION. s sTRAIGHT’MUL- 

TIPLE. 

that the channels are not equally used. 

The first to be tested will be in use most of the time, while 
the last in the group will seldom be busy. It is not true, 
however, as might be supposed at first thought, that when 
six channels are busy it is always the lowest six, for the following 
reason: If, when the last call preceding the one under con- 
sideration was made, the first six channels were busy, this last 
preceding call was assigned to the seventh channel; but 
between that time and the present, one of the six calls which 


were originally in progress may have been discontinued. 


This would leave exactly six busy channels, but they would 
not be the lowest six and a hunt of seven terminals would not 
be required. For example, if the call which occupied the 
lowest channel in the group has been discontinued, the switch 
which handles our present call need only test this lowest 


360 PROBABILITY AND ITS ENGINEERING USES 


channel in order to find accommodation; so instead of hunting 
seven terminals it need hunt one only. 

What we must find is the probability that the first k channels 
are busy and the next is idle. This we may easily do by noting 
what would happen if, for some reason, all those calls which 
did not find service among these & channels were instantly 
cleared. Obviously this would have no effect whatever upon 
the service rendered by these & channels. Hence the chance 
of the first k channels being simultaneously busy is equal to the 
probability of all k busy on the basis of formula} (167). 


Hence we have ?: 


7 (172) 

The probability which we desire — that 1s, y)P:4,— may 
be obtained by subtracting from all the cases in which the 
first k channels are busy those cases in which the (k + 1)st 
channel is also busy. The latter cases, however, are repre- 
sented by the probability y.41,P, which is given by the same 
law as ~,P. We therefore have for the probability of hunting 
exactly & + 1 terminals 


PRA) =e Spal. Giz) 


Similarly the probability of hunting exactly k terminals is 


“” ow 


DE) = pant mP = Lee as LL se (174) 


1 We shall assume that calls are distributed individually and collectively at random. 
Other assumptions could easily be used, but this will be quite sufficient for our present 
purposes. a 


? The notation has this significance: ;P means “ the first trunk busy ”’; 1,3P2 means 
“the first and third busy and the second idle”; ,4;P5 means “ the first four busy and 
the fifth idle”; ¢Pj5; means “ the first five idle and the sixth busy. The-scheme, I 
think, is obvious. 


‘concentrated near the bottom, and the 


§ 130. INDIVIDUAL HUNTING 361 


The expected hunt under these circumstances is ! 


This can be thrown into the form 
v—-1 
e(k) = 2 ws (175) 


which is more suitable for numerical calculation. 

If the multiple is so arranged that each channel appears 
as often in one position as in any other the situation is quite 
different. In this case busy channels peta eee 
willipe distributed im haphazatd fashion scarts rRom | NORMAL 
over the entire bank instead of being R=ST POSITION. SLIPPED 
MULTIPLE. 
chance of an idle channel being near the bottom of the group 
will be materially greater than in the preceding example. The 
solution of the problem is obtained by the following argument: 

The probability of exactly 7 busy channels is denoted by 2 
Pj). If these 7 busy channels are distributed at random over 
the entire group, the chance that the first k tested by the 
switch are all busy is 


EE eh (176) 


The probability of testing more than & terminals is found 
by summing the product of these two expressions for every 
possible value of 7. Formally it is given by the formula 


v 


P(> k) = VwP-P(;). 


jak 


1 Note how the assumption of “lost calls cleared” is justified by the fact that, 
when all » channels are busy, the switches discard the call. Hence the chance of 
hunting just v terminals is just ,»—1)P, while p(> ») =o. 


2 We shall use for it the formula ~P’’(/), to conform to the conditions as we have 
laid them down, In other circumstances some other formula might be needed, 


362 PROBABILITY AND ITS ENGINEERING USES 


Using formula (166) for P(j) we find that 
(v—k)! 2 e 

Me Ee 4 — kj)! 
x a es 


1=0 


P(>k) = 


v! 


If, now, we call 7 — k by a new symbol (which may as well 
be 7 as anything, since the two summations are entirely inde- 
pendent) this becomes ! 


€ 
(y k)} i=0 i! aU, 

i 

€ 


= == 
p(> k= e ae - ar (177) 
peg 
As before, : 
DR): SPC hia re ae) (178) 


We may also find the expected hunt, which is — 
AO TO 
k=0 


This, however, can be further reduced to a form which is more 
suitable for purposes of computation by noting that when 
written out in full it is: 


e(k) = p(>0)—p(>1) 
+2 p(>1)—2 p(>2) 
+3 p(>2) —... 


vee = (0-1) p(>r—1) 
+ » p(>?-1), 


which is obviously equal to 


(k) == p> b. 79) 


1Except for & =». It is obvious, from a common-sense standpoint, that the 
switch cannot hunt more than vy terminals. Hence p(>v) =0. The formula fails to 
give this because the switch might (logically) fail in its vth trial if all channels were 
busy, and in that case a (v + #)th trial would be needed for success. 


§ 131. INDIVIDUAL HUNTING 363 


As an illustration of the extent to which the average hunt 
is reduced by slipping the multiple, the results of a numerical 
example computed inaccordance with each of these formule 
may be presented. The case chosen is one in which the total 
number of channels is 7 = 100 and the expected number busy 
is ¢ = 34.49. If each channel appears in the same position 
before all the switches the average number of terminals tested 
is found to be 19.6, while if the multiple is slipped the average 
number of tests is only 1.53. 


§ 131. Individual Hunting with Stay-Put Switches 


In studying the problem of individual hunting with stay- 
put switches we shall assume that the switch is capable of 
hunting over the entire group of channels no matter from what 
position it may start, but that if no idle trunk is then found it 
will not repeat the test. Otherwise when all trunks are busy 
the number of terminals tested before an idle trunk is found 
would depend upon the length of time which elapses before a 
trunk becomes idle and the speed with which the test is made, 
both of which questions we wish to avoid. 

We consider only the case of selector switches, as the other 
case is trivial. It is obvious that after the system has been 
impopersions sor a certais—jength Of oe. cray-eur » selec. 
time, the switches will be distributed or HUNTING ovER A 
aterancem “over the terminal bank. *7*"0*" MUbTirh=. 
Since the busy channels are likewise distributed at random 
over the bank they are also at random with respect to any 
switch. Hence in this case the hunting probabilities follow 
exactly the same law as if the switches started from a normal 
position and hunted over a slipped bank. The formule to be 
applied are therefore (177) and (178). 

Since the positions of the busy channels are distributed at 
random with respect to any one switch, oye sray-pur sELEC- 
even when the multiple is straight, it TOR HUNTING OVER A 
follows that no change is introduced if “'?P*D MUMTIPES: 
a slipped multiple is used, and formule (177) and (178) again 


apply. 


364. PROBABILITY AND ITS ENGINEERING USES 


§ 132. Group Hunting with Stay-Put Switches 


Under this heading we shall consider two distinct cases 
distinguished in the following manner: When a group of 
switches of the stay-put variety are all started at once it may 
happen that the first to reach the desired terminal + may be, 
not a single switch, but two or more which are accidentally 
moving together. The problem then arises as to what dis- 
position is to be made of them, for it is obviously not desirable 
to allow them all to connect with it. There are two alterna- 
tives: one is to allow some one of the group to seize the ter- 
minal; the other is to pass them all by and wait until it is 
tested by a switch which is travelling alone. 

In the first case, if \ switches are searching over a straight 
multiple, the solution is easily obtained by this line of thought: 

Any individual switch is just as likely 
GROUP OF sTAaY-PUT to be on one-terminal as another, quite 
FINDER SWITCHES HUNT- D 
in@) OVER A ereaicnys dependently of sawhere) sthe meter 
MULTIPLE. IF Two AR- switches may be. Then let the heavy 
vavne cHakos Gr rune une Ol Vig: 46cbe. the callin ecomiees 
CALL. there being in all » sources and X 
switches. The chance that a particular 
switch is on some one of the k bracketed sources to begin 
with is k/y; and the chance 
that it is zot there is 1 — k/». 
As the positions of the al: 
switches are quite independ- 
ent, the chance that no one of 
: 4 Fic. 46. 
the switches is on any of the 
bracketed sources is (1 — k/v)*. But this is just the condition 
under which the switches would have to make more than k tests 
in order to reach the calling source. Hence we have: 


p(>_k) = (: — ay (180) 


“Terminal” is here used as a general term meaning “ source or channel,” 


§ 132. GROUP HUNTING 365 
As before 


pt) = p> k=) ~9(> ) = (1-4) (1-4) (181) 


The expectation of k — that is, the number of tests we may 
expect them to make before giving service — is 


Xr a ax nN 
Beer ats 90) 


In the second case which we have mentioned the formula 
is decidedly more complicated, and requires the use of the 
principle of alternative compound prob- 

eye 8 . . . GROUP OF STAY-PU 
abilities for its evaluation. We begin fwper pie Leh aehies trey 
as follows: ING OVER A STRAIGHT 
: 7 . . MULTIPLE. IF TWO OR 

If there are just 7’ switches resting yore ARRIVE AT ONCE 
on the & bracketed sources of Fig. 46, NEITHER TAKES CHARGE 
the chance that.-there are exactly zon °* 74" ©*' 


the source next below them is given by the formula 


ore 1 )( : Vee 
rPy(Z) = 1 ee Lg A aaa : 


Evidently if the 2’ switches which rest on the & sources are so 
arranged that no source has exactly one switch, the test must 
exceed & terminals. That is, it will be more than k +1 if 
i #1, while if 7 = 1 it will be exactly + 1. Hence if we 
knew the probability of there being 7’ switches on the bracketed 
sources so arranged that no source has on it exactly one switch, 
we would be able to compute the probability of a hunt of any 
desired magnitude by merely building up the proper form of 
alternative compound probability. For the moment we may 
content ourselves with writing a symbol for it, in the hope 
that later on we may be able to find a formula for it. Let us 
choose for this purpose the notation ,P(i’), the prefixed sub- 
script being added in this case, as in the case of ,py(i), to call 
attention to the fact that the probability depends upon the 


366 PROBABILITY AND ITS ENGINEERING USES 


particular value of which we choose to consider. Then we 
have at once, as a formal expression for our solution, 


x=1 


p(k + 1) = = gE) ppt) (183) 


As for the determination of an expression for :P(z’), we 
notice that if there are exactly 7’ switches on k + 1 sources, 
and if these switches are arranged in the fashion described, 
it must be true, either that all are on the first k and none on 
the (k + 1)st,or else all but swo are on the first k and those two 
are on the (k + 1)st, or else all but three are on the first k and 
those three on the (k + 1)st, or some similar arrangement. 
The only cases which are excluded are those which would 
require only one on the first & or only one on the (& + 1)st, since 
either of these cases would require a single switch on some 
source. Hence we have the recursion formula 


varP(’) = pP(’) zpelo) + PQ — 2) ePy-(2) 
ate (tea 3) rPrv—3(3) = OIA 
a p> wad ae ) Deas ys 


it being understood that the values i=1 and i = 7’ —1 are 
not to be included in the summation. From this recursion 
formula it is possible to obtain ,,,P(i’) if ,P(z’) is known. 
Hence if the values of this function are known for any one 
value of k they may be found for all others. But it is obvious 
at once that 

— 1)-" P 

yp» 


Wig ow 


With this as a start, it requires only routine algebra to show 
that! 


) Meee 
Pi’) = Ch 2 ar — ai 44 C8), 


A-V 
SPU) = CpL—D yrs ai af 14 371-1) 7H —1) Cal 


a ee ee 

‘These expressions have different values according as i’ is, or is not, equal to &. 
This has been taken account of by the terms having the factors C{y_, which vanish 
for all values of i’ except i’ = k. 


Wee 
and in general 


Ve % 


Pi) = PPS or 


ts oe 


GROUP HUNTING 


=0 a 


367 


5 (k — Aye 


i! : 
Gera tee: 


We now have formule for both ,P(i’) and ,p,(i), and can 


therefore substitute them 1 in (183). 
then shows that 


Some more routine algebra 


k | ; bea hNeWed 
pe+iy= S(t 
when & + 1 < », while 
ae, r! py — h)™ 
pv) = 2 (= Patera (184) 


For completeness we quote also the expected number of 


tests: 


x y—-1 
ei(k) = Th a (= i) 4 Chat 
Vv n=0 


‘= 
muy 


re (185) 


These formule probably serve no other useful purpose, so far 
as this text is concerned, than that of showing how complicated 
it is possible for problems of this general type to become. 


The method by means of which (180) 
and (181) are derived can be applied 
without change to the case of a slipped 
multiple. Hence these formule are 
equally valid in this case. The same is 
true of (184) and (185). In both cases 
the mental picture upon which the 
argument is based requires some modi- 
fication; but the steps to be followed 
and the results themselves are identical 
throughout. 


GROUP OF STAY-PUT 
FINDER SWITCHES HUNT- 
ING OVER A_ SLIPPED 
MULTIPLE. IF TWO OR 
MORE ARRIVE AT ONCE, 
ONE TAKES CHARGE OF 
THE CALL. 


GROUP OF STAY-PUT 
FINDERS HUNTING OVER 
A SLIPPED MULTIPLE. IF 
TWO OR MORE ARRIVE 
AT ONCE, NEITHER TAKES 
CHARGE OF THE CALL. 


- If instead of finders, however, we are dealing with selectors, 


368 PROBABILITY AND ITS ENGINEERING USES 


the hunts are in all cases likely to be shorter. If two or more 
can arrive at once and still take charge 

GROUP OF STAY-PUT 

SELECTORS HUNTING of the cal the formula by means of 

OVER A STRAIGHT MUL- which the results are expressed is not 


iy ere eae eee very difficult to obtain. We fall back 
TAKES CHARGE OF THE upon the similar case with only one 
ik selector, for which the solution is given 
by (177) and (178). It is obvious that any member of the 
group at present under consideration might be the ove selector 
previously considered. Hence each of the group must obey 
the laws obeyed by that one. This being understood, the 
chance that some 7 switches would need to hunt exactly k 
terminals while the rest hunt more than k is 


Ch [pi(A)F [pi(> APS 
the pi(k) being given by (178) and pi(> k) by (177). Hence 
the chance of a hunt of just & terminals is ! 


a(t) = EC) ps [a> BP 


= [pilk) + pi(> &)I‘ — [p(> AP 
= [pi(> k — 1)P — [pi(> AP 


(at) GE) 
oleic mii viia ata ae 
The chance of a hunt of more than & terminals is 
bas 1 x 
Pr(> Rk) = [pi(> &A)P = (a2 is (187) 
v—-k 
and the expected hunt is . 
pate, MaONX 
a) = B(eet). (188) 
E=0 v—-k 


As an illustration of the extent to which group hunting 
may reduce the average hunt we may consider the same 
illustration as before. 
eg eee eee 

Except for & =. The formula in this case lacks the negative term. 


ellie ll es 


S332 DOUBLE CONNECTIONS 369 


We found that with a straight multiple and a switch which 
starts from normal, the average number of terminals tested 
was 19.6. With stay-put switches, only one being assigned, 
or with a random slip in the multiple, this was reduced to 1.53. 
If the switches stay put and are started in groups of two, 
three, four or five, the answers are, respectively, 1.14, 1.04, 
1.014 and 1.005. Remembering that the minimum is a test 
of one terminal, the extent to which the test is reduced by 
starting a group is evident. 


$133. The Problem of Double Connections 


One type of problem which frequently presents itself in 
connection with the use of apparatus by a considerable number 
of different people is that of preventing one person from seizing 
what is already in use and thus causing inconvenience to some- 
body else. This is frequently accomplished by operating a 
relay, or some similar device, which cuts off access to the 
particular channel which has been assigned. Obviously, such 
a blocking device will ordinarily require time for its operation, 
and during a portion of this time at least it will be possible for 
another source to seize the already busy channel, thus creating 
what is technically termed a “‘ double connection.” We are 
interested in determining what proportion of calls can be 
expected to suffer inconvenience from this source. 

Problems of this sort ordinarily arise in checking up 
whether a proposed system does or does not meet certain 
specified standards, and since we can tolerate much larger 
errors in studying such problems than in studying losses, we 
need not be so careful of the exactness of our assumptions. 
In particular, we may assume that the calls occur individually 
and collectively at random, and that, in assigning channels 
to calling sources, it is a matter of pure chance which of the 
idle channels is chosen. As always, we denote the number of 
channels by », the calling rate by 7, and the expected holding 
time by T. 

Let us suppose that we observe the group for a short time 


370 PROBABILITY AND ITS ENGINEERING USES 


dt, at the beginning of which exactly 7 channels are busy. The 
chance that a call is made during this interval is obviously 
n dt; and if so, the chance that it is assigned to some particular 
channel which we have set out to watch is 7 dt/(v — 7). 

Next we write down, by the use of alternative compound 
probabilities, the chance p(4) that this particular channel is 
seized during dt, if we do not know how many channels are 
already in use. We assume that the number of calls originated 
per unit of time is 2 and that the number of channels is ». 
Denoting by P(j) the probability that exactly 7 channels are 
busy at the beginning of this interval, it is: 

y—1 . 
aia ndt X AGUE (189) 
ae) Ja Oey 

If we assume that lost calls are instantly cleared, P(/) 
must be given the form (166); whence (189) becomes 


p) = & PG) 


oF Yah Kd = (190) 
ee 
which we shall denote simply as f(e)dt/T, f(e) meaning, of 
course, the fraction in (190). 

It remains to determine the relation of this result to the 
probability of double connections. From the fashion in 
which we have derived it we are assured that it represents the 
probability of a call being made upon a particular idle channel 
during a particular short interval of observations One way 
of making this observation, however, would be to introduce a 
call and see if it is interfered with. Hence it is obvious that 
what we have obtained is the probability that our call will be 
followed by another within so short:a time as to result in a 
double connection. In other words f(e) dt/T is the prob- 
ability that a call will be involved in a double connection by 
virtue of the fact that another call originating later than itself 
obtains access to the'same trunk. It is just as likely, however, 
to interfere with someone else who arrived earlier, as to be 


§ 133. DOUBLE CONNECTIONS 371 


interfered with by someone who arrives later. Hence f(e)dt/T 
is only half of the total probability that a call will be involved 
in a double connection: This leads us at once to 


plde) = 2p(8) = 72 f(6) (191) 


Fic. 47. 


In order to facilitate computations, a chart of the function 
f(e) has been prepared for a considerable number of values of 
» between 1 and 100. This chart, with the abscissze measured 
in terms of ¢/» instead of e for the sake of compactness, 1s 


372, PROBABILITY AND ITS ENGINEERING USES 


presented as Fig. 47. Its use may be illustrated by a simple 
example: 

Suppose there are 20 trunks handling 288 calls per hour, 
each with an average holding time of 100 seconds. Then 
T = 100, n = 3288; = 0.08, e = nT = 8 and vy. = 20; so that 
e/v = 0.04. Referring to Fig. 47, we find that for this case 
fle) = 0.720. Suppose that the unguarded interval 1s 0.05 
second, then the probability of a double connection is: 


2X0.0§ - 
p(dc) = mera Fo) event 
This means that under these circumstances, 72 calls out of 
every 100,000 would be involved in double connections, due 
account being taken of the fact that every double connection 
involves two calls. 

Other problems involving double connections can be worked 
equally well provided the value of /(e) is taken from the 


proper one of the curves of Fig. 47. 


§ 134. Delays in Awaiting Service 


The problems presented by systems which operate upon a 
delay basis instead of a loss basis — that is, in which a call is 
not discarded when there is no apparatus to handle it, but is 
merely held over until something becomes free— are much 
the most complicated with which the traffic statistician must 
deal. There are several reasons for this: 

In the first place, we have seen in our study of the prob- 
ability of loss that our results are quite independent of the 
lengths of individual calls. They are the same in any two 
systems which possess the same traffic density, whether that 
trafic density be made up of many short calls or a few long 
ones, and whether or not the calls are all of the same length, 
or of different lengths. »>When we come to the study of delay 
problems, however, this is no longer true. It can be seen at 
a glance, for example, that if the calls are all of like length, 
the delays will be greater the greater that length may be. 


§ 134. DELAYS IN AWAITING SERVICE 373 


Thus, a system which had an expectation of three twenty- 
minute calls per hour, and one which had an expectation of 
sixty one-minute calls, should be identical so far as probability 
of loss is concerned; but it is obvious from a common-sense 
standpoint that the person who sought service and found no 
available apparatus in the first case would face the prospect of 
a longer wait before some became idle than in the second. 
It 1s perhaps not so easy to see from an intuitive standpoint 
that the distribution of call lengths — whether they are all of 
like length or not, and if not, how they differ — also affects 
the problem; but once we come to formulating our results we 
find that this is true. 

In the second place, when we were dealing with loss prob- 
lems, we were able to affirm that the end-points of calls, like 
their points of origin, were distributed at random. But this is 
no longer true in the case of systems with waiting arrange- 
ments. Thus, if calls are all of unit length, and if there are 
just five trunks, there can only be five calls in progress at any 
time, and hence no more than five can terminate within the 
same unit of time. They may still originate at random, but 
the delays to which some of them are subjected smooth out the 
distribution of their end-points. It is to this fact, indeed, 
that most of the mathematical difficulties are due. 

_ In the third place, in a system which operates upon a 
loss basis, every subscriber who is inconvenienced at all is 
inconvenienced in just the same way as every other: his call 
is discarded. But in a delay system some suffer delays of 
negligible length, others long delays. It is no longer sufficient 
to specify the standard of service by a mere statement that 
such and such a proportion of calls is delayed: it becomes 
necessary to say instead what proportion is delayed more 
than a specified time. "This adds one more complication to the 
problem. 

As a result of these various complexities there are only 
two delay problems which have sufficiently simple solutions to 
justify their presentation in a text such as this. Indeed, the 
rest have in many instances been dealt with in an approximate 


374. PROBABILITY AND ITS ENGINEERING USES 


fashion only, and can be said to have been “ solved” only in 
the sense that engineering design can be carried out with 
reasonable assurance that a sufficient factor of safety exists, 
but not in the sense that the factor of safety is known. We 
present these two solutions only, as illustrations of the simpler 
methods of attack. 

In the first place, if all the calls must be accommodated by 
a single channel instead of by a group of channels, and if they 
are also all of the same length, a rigorous solution can be 
found. We speak of this as the case of “ non-cooperative 
channels.” . 

In the second place, if call lengths are governed by an 
exponential distribution function (which has the effect of 
rendering their end-points “ random ”’ in spite of the attempts 
of the system to smooth them out) a general solution can be 
obtained even if there is a group of channels instead of a 
single one. 


§ 135. Calls of Equal Length at Non-Cooperative Channels;» 
The Probability of Congestion 7 


We shall assume that calls originate individually and col- 
lectively at random, and that if at any time the congestion 
is so great that more than one is awaiting service, they will be 
served in the order in which they originated. As in the case 
of the problem of loss, we denote by P(j) the chance that there 
are just 7 sources either seeking service or being served at the 
same time — that is, that the congestion is 7. We also assume, 
as always, that the system is in statistical equilibrium, so that 
these probabilities are independent of time. 

Let 7 be the calling rate and T the holding time. Then, 
since it is impossible for calls to overlap, the proportion of 
time during which our channel can be expected to be busy is 
nT, and its expected idle,time is 1 — nT. This latter, however, 
is obviously P(o). Hence, using the notation nT = as 
before, we have 


P(o) i 


§ 135. DELAYS AT NON-COOPERATIVE CHANNELS 375 


Next we consider an interval of length exactly equal to the 
holding time, and ask;-What is the probability that the con- 
gestion is exactly 7 at the end of this interval? Obviously, 
there will be 7 calls in progress at the end of the interval pro- 
vided there was none in progress when it began and just j 
came in meanwhile, or provided there was a congestion of 1 
when the interval started and 7 came in meanwhile (for the 
one which was in progress will have discontinued before the 
interval ends), or provided there were 2 in progress at the 
beginning of the interval and 7 — 1 came in, and so on. As 
the system is in statistical equilibrium, the probability of any 
one of these states at the beginning of the interval is equal 
to the probability of the same state at its end, hence we arrive 
at the law 


: Cr a dee aT aia oc eu 
ty = Po) 4+27(1)] 7 Sea NP cm igus Oh Pe aay| 
+...+¢PUG+e% (192) 


Suppose, now, that we write down the first few of these 
equations. They are: 


Pio) = 1 Plo Pye ss 
P(t) = [P(o) + P(r)lee* + P(2) es, 


P(a) = [P() + P(n)] s ent 4+ P(2) ce-* + PCG) em". 


These equations can easily be solved in order, and give 
(when we remember that P(o) = I — 6) 


PG) = 1 — (e" — 1); 
PQ) = fe" = ee), 


2 
PQ) =.0 — «) E — e**(1 + 2) + & (. + “I, 
the general rule being, as can be shown by more elaborate 


methods, that P(/) is always the product of two factors, of 
which the first is (1 — 6), and the second is composed of a 


376 PROBABILITY AND ITS ENGINEERING USES 


series of exponential terms with suitable coefficients. These 
coefficients are, in every instance after the first, the sum of 
two terms taken from the power series expansion of the expo- 
nential which they multiply: and specifically they are the 
first and second, the second and third, the third and fourth, 
and so on, respectively. The general formula therefore 1s 


(ke)?-* (ke) iaeae 
Gia =): (193) 


§ 136. Calls of Equal Length at Non-Cooperative Channels; 
The Expected Delay 


P(j) = (0-9 B(— ter ( 


Our next objective is the determination of the expected 
delay «(7). It is at once obvious that the sum o-P(o) + 
1-P(a).+-2-P(2) - represents the agetegatenpexpected 
length of all calls in a typical unit of time. This means, not the 
aggregate of the times during which they are obtaining service, 
but the sum obtained by adding to those “ intervals of use,” 
the delays as well. But the aggregate of all the intervals of 
use is «. Similarly, the aggregate of all the delays is obtained 
by multiplying the expected number of calls by the expected 
delay «(7). Thus we arrive at the relation 


eat ese ay P(j). (194) 


From this equation the expected delay e(r) may be obtained, 
provided we can evaluate the summation which occurs in the 
right-hand member. To do this we return to the equation 
(192), which for our present purpose may be written in the 
form 


Pj) = BPG +1-# 


=\€ 


ée 
k! 


It is impossible to substitute this formula in (194) and thereby 
evaluate ¢(r), for the attempt to do so leads us to a worthless 
identity. However we can accomplish our purpose by a 
somewhat indirect artifice. 


If we form, not the sum 17 P(/), but the sum De P(j) 


eas 
J: 


$136. THE EXPECTED DELAY 200) 


instead, and if we then interchange the order of the & and 7 
summations in the usual manner, we arrive at the equation 


eo) roe) (oe) koe (oe) ie 
>» 5 he + F ae € € VY s9 é€ ¢€ 
Hd P()) 2 Pa J i . 3 k} Fe) zs f! } i195) 


Direct evaluation gives us the result 


yp i erat). (196) 


This takes care of the last term of (195). 

As for the double summation, it can be simplified by 
replacing the element of summation 7 by a new element 
h=j—k-+1. Since j? = h? + 2h (k — 1) + (k — 1)2, the 


double summation splits up into three terms: 


a) Ey € oo 

pig ag ay 

k=0 k} h=1 
oo en eas oO 

2-2 (k—1) Did. PU) 

E=0 k} h=1 
oO As me ce) 
 (k—1)? SP th). 
k=0 k} h=1 


In the form in which the terms now appear the A- and k- 
summations are independent of one another and their values 
can readily be found. Actual evaluation gives for the k-sum- 
mations the results 1, e— 1 and «? —e+ 1, respectively; 
while the first 4-summation is identical with the left-hand side 
of (195),! and the last A-summation is 1 — P(o). Hence, 
when we substitute all these relations, together with (196), 


in (195), we obtain 
Ss e? — 2e 
ak Pik = seen (197) 


We now return to (194) and note that, since »T = e, the 


1The difference in the lower limit of summation is unimportant, since the term 
corresponding to j = 0 is itself zero, 


378 PROBABILITY AND ITS ENGINEERING USES 


left-hand member can be written (: mi sy e. Combining 
(194) and (197) and making use of this relationship, we arrive 
finally at a formula for the expected delay 


er) 2 as 
is 9. Toeene: 


(198) 


§ 137. Exponential Distribution of Holding Times; Delays at 
Cooperative Groups of Channels 


We found in our discussion of probability of loss — and the 
same applies equally here — that the entire problem could be 
expressed in terms of two “elementary probabilities ”: the 
probability that a new call arrives during a test interval dt 
at the beginning of which the congestion is /, and the probability 
of one ending during such an interval. As we are assuming 
that the sources which originate calls have no knowledge of the 
state of the system, and therefore cannot be influenced by it 
until they have actually placed a call, the first of these ele- 
mentary probabilities is here just what it was in the problem 
of loss. Every difference that exists between the two types 
of service must therefore be attributable to some difference in 
the second elementary probability. Let us, then, think for-a 
moment about the chance of a call, known to be in progress 
at the time ¢'= 0 at which we begin to observe it, ending 
before ¢ = dt. 

Our problem contemplates no change in the nature of the 
call after service is given. Hence the fact that we observe a 
call to be in progress means merely that it must end within. a 
time equal to its own holding time, and the probability that it 
will end during the time df is therefore dt/T, just as before, 
unless we are given information which leads us to infer something 
about the time at which it began. But when the knowledge of 
the degree of congestion is known, this probability is altered — 
for otherwise, as we have already said, both elementary 
probabilities, and therefore P(/) ‘itself, would be the same as 
in the loss problem, Hence a knowledge of the degree of con- 


a 


§137. DELAYS AT COOPERATIVE CHANNELS 379 


gestion must, by inference at least, convey information regarding 
the times at which calls were given service. 

Now it is a peculiar property of the exponential distribution 
of holding times that even absolute knowledge of the time at 
which a call was given service does not affect its chance of 
ending during d¢; and obviously if absolute knowledge does not, 
inferential knowledge cannot. ‘This is why the delay problem 
can be easily solved when such a distribution of call lengths 
is postulated. 

To show that the length of time a call has been in progress 
does not affect the probability of its termination, when the 
call lengths are distributed in accordance with the law ? 


2(T) = ae”, (199) 


is a very simple mathematical problem. Suppose the call is 
known to have been in progress just ¢ seconds. It must then 
be at least ¢ seconds long, the probability of which is 


D>) = {om dT = ¢-"?; 


If it ends during dt it must have had a length lying between 
t and ¢ + dt, the probability of which is 


p(t) dt = eae: (200) 


The quotient of these * is the chance that the call ends during 


1This statement is undoubtedly true, as the above argument shows, and that 
without regard to how the lengths of the calls may be distributed. I have frequently 
attempted to formulate a direct argument to replace the reductio ad absurdum 
here given. Such an argument should give a value for the elementary probability of 
termination under congestion j, and it would be a very easy matter to formulate a 
complete solution if this were known. The direct relationship, however, still remains 
as baffling as ever. 


2For the time being we write T and 7, instead of «(T) and ¢(r), in order to 
simplify the appearance of our equations. 


8 Obviously p(t) dt (the chance of a call ending during df) is the chance of it lasting 
until ¢ = o [which is p(> ¢)] multiplied by the conditional chance that it does not 
last beyond ¢-+ dt [which we may write p>e(<¢+dt)]. The first two have been 
computed, and the third is exactly the probability for which we are seeking. 


380 PROBABILITY AND ITS ENGINEERING USES 


dt, and turns out to be dt/T. As this is independent of ¢, the 
statement is proved. 


§ 138. Exponential Distribution of Holding Times; The Prob- 
ability of Congestion j 


Suppose there are » channels serving » sources, each of 
which latter originates an average of 7 calls per unit time. 
Suppose, further, that the call lengths are distributed according 
to the exponential law (199). Then the principle of statistical 
equilibrium leads to the set of equations 


dt } 
(1 — Xadt) P(o) + ae es = Po), 


dt 
(A a dt) P(o) + — (’\— I)adt — 4 P(t) 


2 “2 Para aa 


(X41) adt P(r 1) + | 1-0») a dt— se P(v) 
a, L (sed) 
Bhiytae 1) = Pv), 


vat 
(A— v) adt P()+|1—(—»—1) adt— “| P(v+1) 
+22 P+ Dy ae ce 


edt Py Ss1)- E - 2) P(r) = PA), 
where vogue eee 
1—nT— nz 

These equations differ from those obtained in § 116 in just 
two respects: The first is, that sources are effectively busy, 


§ 138. EXPONENTIAL HOLDING TIMES 381 


so far as the origination of calls is concerned, whether they 
are actually being servedyor are only awaiting service. Hence 
P» in (146), takes the form m(T + 7), and gives for n/(1 — Dp) 
the quantity which we have called « But in the second 
elementary probability (148) p means, as before, 77; for it is 
only when we find the source actually being served that there 
is a chance of that service coming to an end during dé. 

In the second place, when the congestion exceeds the 
number of available channels, the number of calls in progress 
is equal to », not to 7. Hence in place of (149) we must now 
write jdt/T, provided j < », but vdt/T for all larger values of j. 

From the first of the equations (201), P(1) may be found 
in terms of P(o). Then from the second P(2) can be found 
in terms of P(o), and so on.! As we should expect, the result 
takes different forms according as / is less than or greater than 
We otis 


PC) -C BPO); 7S 
fis (202) 
P(j) = i DCB APO): jay 
where 6 has been written for the quantity 
if 
B= (203) 


I —2T— nt 
So far the constant P(o) in these equations is arbitrary, but it 
may be determined by means of the condition 


Sirk, 


It is found to be given by the equation 


r FI 
I a 
—_=(1+,8)}+ oe(f, =. (204) 
P (0) j=vt+1 V. 

These formule apply in case the number of sources is 
limited and the sources are independent. If, on the other 
an 

1The labor of solving (201) is much less if determinants are used than by the 
scheme suggested; but adeptness as well as knowledge is required. 


382 PROBABILITY AND ITS ENGINEERING USES 


hand, the calls occur individually and collectively at random, 
the solutions take somewhat simpler forms that may be 
obtained by setting » infinite in the above expressions. They 
are 


IA 
x 


J 
PUPS Sah) hers 
J: 
: (205) 
Pj) =v 'PO), fz” 


where « is the expected traffic density, computed quite without 
regard to delays, and where P(o) is found to be 


v oo 


I I € é 
——~ = & + —— -—__ = 6 
POM een ee saa 
§ 139. Exponential Distribution of Holding Bae The Ex- 

pected Delay Z 


The probability P(j) represents the proportion of time 
during which 7 calls are being served or are awaiting service. 
If 7 is not greater than », these calls are all being served, while 
if 7 exceeds », v of them are obtaining service and the remaining 
j — v are standing by waiting for an idle trunk. It follows, 
therefore, that the aggregate length of all calls served per unit 
time must be given by 


y nN 
Dap yp eae vale) 
j=0- j=vtl 


while the aggregate length of all the delays which occur per 
unit time must be 


- (G =) Pld). 


If the latter of ne is divided by the expected number of 
calls per unit time, the expected delay, 7, is obtained. How- 
ever, the average number of calls is \7 = e/7. Hence we have 
the formula 


=7 2 G-») PC). > Gone 


§ 140. EXPONENTIAL HOLDING TIMES 383 


This can also be thrown into the form ! 


Pee P(o) ( ") mle Pon 5) z 
T ev!*P'(x)* as Lees Ce DS) es P'(A—»+1) | 


' (208) 


where the asterisk following a P or II indicates that this symbol 
is to be evaluated as if the expected traffic density were v/8. 
From this equation it is quite possible to find 7, though it 
occurs implicitly on the right-hand side due to the relation 
(203). The process of computation consists in assigning to B 
such values as may be desired, computing the corresponding 


7s from (208), and then finding 7 from (203). In this way a 


table of corresponding values of ” and 7 are obtained, from 
which the delay corresponding to any calling rate may be 
found by interpolation. 

The computations are not at all simple; but they can be 
performed when it is necessary to answer questions of sufficient 
importance to justify the expense. In many cases it is satis- 
factory to assume that the calls originate individually and 
collectively at random, in which case the formula is much 
simpler. It is derived by exactly the same line of argument, 
and is found to be 


7 I e” 


T— (v=o? (»— 1) 


P(o). (209) 


§ 140. Exponential Distribution of Holding Times; The Prob- 
ability of a Delay Exceeding the Length + if Calls are 
Served in the Order in which They Originate 


If a test call finds exactly 7 calls ahead of it, the delay to 
which it will be subjected will exceed the length 7 if, and only 
if, 7 — vor less calls end during the timer. In case 7 is known, 
therefore, the probability of a delay as great as 7 can be found 


1 In this case (as in several others which follow) we do not attempt to explain the 
method by means of which one form of equation follows from another. In each such 
case, however, the only processes required are those of routine algebra. They contain 
nothing of interest from a probability standpoint, 


384 PROBABILITY AND ITS ENGINEERING USES 


by determining the probability that a preassigned number of 
calls end during a given time 7. This is our next objective. 
If we consider any time interval df, at the beginning of 
which all trunks are busy, we know from § 138 that the prob- 
ability that a call ends during this interval is »dt/T. We also 
know from § 137 that the chance of a call ending during any 
such interval is altogether independent of what may have 
happened in any other. Hence these end-points are dis- 
tributed individually and collectively.at random with an 
expectation of »/T per unit length, and the chance that exactly 


i end during the time 7 is 
Topics e i 
PENS 
(=) aa (210) 


T; 1! 


which we shall denote simply by “P’(i)", the two asterisks 
indicating that the probability is to be computed as if the 
expected traffic density were vr/T. The chance of 7 — » or 
less terminating during this time (or, what amounts to the 
same thing, the probability of a delay exceeding r) is 


ie) 
Dil Sit) = RG) eee 
t=0 


All this is true only provided we know that exactly 7 calls 
are either in progress or awaiting service when our test call 
occurs. However, during the proportion P(j) of the time, 7 
calls are in progress; and the calling rate during such intervals 
is (A — 7)a calls per unit time. It therefore follows that the 
number of calls originated per unit time which find just 7 
calls preceding them and are delayed more than 7 seconds is 


Ch =e PUG) pi a): 


This formula obviously applies for any value of j; and since 
any call which is delayed a longer time than 7 must have been 
preceded by some number of calls, it follows that the number 
per unit time which may be expected to be delayed more than 


§ 141. EXPONENTIAL HOLDING TIMES 385 


t may be obtained by summing this expression for such values 
of 7 as are capable of causing delays. The result is 


ZO = 4) 0! fae) pil> 7). 


When this result is divided by dz (that is, by the total number 
of calls which are expected to originate per unit time) it gives 
us the probability of a test call experiencing a delay greater 
than 7. 

This formula can also be put in a somewhat more convenient 
form for purposes of Sere It is, in fact, equal to 


y” pelos? 
e(v—1)!*P’(0)* 
As in every other case, it becomes much simpler if we assume 


that the calls originate individually and collectively at random. 
The result is then 


€” = 
P(> 7) = Po) Cae ees (212) 
It must be borne in mind that these results are only valid 
provided calls are served in the order in which they originate. 
If apparatus is assigned in some other fashion a different dis- 
tribution of delays will occur. The results of §§ 139, 141 
and 142, however, do not depend upon this assumption. 


ey Ee = Dye z SW iae atl (pap sa) (221) 


§ 141. Exponential Distribution of Holding Times; The Pro- 
portion of Delayed Calls 


By setting + equal to o in P(> 7) we obtain the proportion 
of delayed calls. From (21 4 we get 
Shien 


eC Ol= Tee EG P(o) STNG (213) 


the asterisk having the same meaning as always. From (212) 
we get the even simpler formula 


v 


P(> 0) = P(o) aa: (214) 


386 PROBABILITY AND ITS ENGINEERING USES 


§ 142. Exponential Distribution of Holding Times; The Expected 
Delay of Delayed Calls 


The expected delay, 7, obtained in § 139 is analogous to 
the result which would be obtained by placing a large number 
of test calls, noting the delay to which each was subjected, 
adding all these delays together, and dividing by the number 
of calls. Many of the calls, however, would be subject to no 
delay, and the average thus derived would apportion the total 
delay among these as well as among the delayed calls. 

If we desire to know, not this expected delay 7 but the 
delay 7; which a call may be expected to have #f it is delayed 
at all, we need only divide 7 by the proportion of delayed calls, 
p(> 0). The result is 


(; a are = ites) aa 7 = P'(y —y+ ne 
B B ; (215) 
tell, ; 


or, if calls occur individually and collectively at random, 


(216) 


These results may complete our discussion. It is obvious 
that we might, by introducing various shades of meaning into 
our assumptions, prolong the study indefinitely, and that quite 
without the necessity of ever passing far from conditions for 
which we might easily find analogues in practice. What we 
have done, however, has probably shown the two things for 
which it was devised: the highly complex nature of the prob- 
lems to which “ traffic” in its most general sense leads us, and 
something — though by no means all — of the methods which 
in part at least meet the needs of such problems. If at the 
same time we have painted a rather wearisome picture, no 
great harm will have been done; for after all it was only the 
old masters to whom the gods granted a virgin field in 


§ 142. FROBLEMS 387 


which to grow those things which were easiest, and to them 
no tools were given. We, who have inherited the implements 
of their fashioning, cannot well complain if the fields require 
more labor. 


PROBLEMS 


1. Prove (170) and (171). 


2. What is the probability of hunting more than & terminals with 
a single selector starting from rest, and a straight multiple? What 
is the probability of hunting more than v? 


3. Prove that the exponential law (1) is the only distribution of 
holding times which possesses the property discussed in § 137. 


(By keeping the expression for p(> 4) in terms of p(4), an expres- 
sion can be found for ps:(>¢-+ dt) which is perfectly general; 
that is, it is true for any function p(¢). The property in question 
states that this quantity does not vary with ¢; which leads at once 
to a differential equation.) 


4. Find the probability that a call is delayed, no matter how much, 
at a non-cooperative channel. 


5. The e(r) given by (198) is the “ unconditional ” expectation 
of delay. It is the analogue of an “‘ average delay’ formed from a 
number of calls, some of which suffered no delay at all. We could, 
however, eliminate from this group all those that suffered no delay, 
leaving only the “delayed calls’’; and we could then make use of 
this residual group to find the “average delay of delayed calls.” 
To such an: average corresponds a “conditional expectation,” 
namely, “the expectation of delay if the call is delayed,” or in 
symbolical form ¢;(r). Find ¢s(r) at a non-cooperative channel. 


REFERENCES FOR OUTSIDE READING 


J. General Studies: 


1. Ertane: Solution of Some Problems in the Theory of Prob- 
abilities of Significance in Automatic Telephone Exchanges, 
Post Office Electrical Engineers’ Journal, Vol. 10 (1918), 
pp. 189-197; Telefon-Ventetider Et Stykke Sandsyn- 
lighedsregning, Matematisk Tidsskrift (1920), pp, 25-42. 


388 PROBABILITY AND ITS ENGINEERING USES 


2. Encset: Die Wahrscheinlichkeitsrechnung zur Bestimmung 
der Wtthlerzahl in automatischen Fernsprechimtern, Elektro- 
technische Zeitung (1918), pp. 304-305. 


3. ODE: Theoretical Principles of the Traffic Capacity of 
Automatic Switches, Post Office Electrical Engineers’ 
Journal, Vol. 13 (1920), pp. 209-223. 


II. Methods of Numerical Computation: 


4. Mouina: Computation Formulafor the Probability of an 
Event Happening at Least C Times in N Trials, American 
Mathematical Monthly, Vol. 20 (1913), pp. 190-193; 
An Interpolation Formula for Potsson’s Exponential 
Binomial Limit, American Mathematical Monthly, 
Vol\-22 (191%), 'pp.. 223-224. 

5. CampBeLi: Probability Curves Showing Poisson’s Expo- 
nential Summation, Bell System Technical Journal, Vol. 2 


(1923), pp. 95-113. 


CHAPTER XI 
FLucTUATION PHENOMENA IN Puysics 


§ 143. Introductory Remarks; Notation 


Among the many important applications of the Theory of 
Probability to scientific problems, those which deal with the 
Kinetic Theory of Gases and other statistical phenomena 
which arise from the molecular structure of matter and elec- 
tricity form a class by themselves to which we may broadly 
apply the term “ fluctuation phenomena.” We have already 
had a few simple illustrations of such studies in §§ 64, 65, 66 
and 88. It is the purpose of the present chapter to present a 
few other scattered results, with a view to illustrating the sort 
of thought processes that are required in this field. 

To begin with, we shall give what is probably the most 
satisfactory derivation of Maxwell’s Theorem that the velocities 
of gas molecules are distributed in accordance with the Normal | 
Law. This derivation, like the one given in § 64, is founded 
upon a line of argument originally carried out by Maxwell 
himself; and while it is not satisfactory in all respects, as we 
shall see, it at least possesses the merit of leading to some very 
important physical and thermodynamic consequences. Before 
entering upon it directly, however, we have need to derive some 
dynamical information with which the reader may not be 
familiar. 

As notation, we shall use the-letters x, y, z to represent the 
coordinates of a point in space, and u,v, w for velocity com- 
ponents in the directions of the three coordinate axes. In 
order to avoid infinite repetition we shall often use the capitals 
X and U as substitutes for the triplets «, y,z and u,v, w, 
and shall write dX and dU in place of the more cumbersome 

389 


390 PROBABILITY AND ITS ENGINEERING USES 


dx dy dz and dudvdw. For example, we shall speak of a 
molecule as being in the element 2X, meaning thereby that its 
center lies in the element of volume bounded by » and x + dx, 
y and y + dy, z and z+ dz. Or we shall say that it is “ of 
velocity class U,’”’ meaning that its velocity components lie 
within an infinitesimal region dudvdw about the values 
U,V, W. 


§ 144. The Dynamics of Collision 


We consider two identical, perfectly hard, perfectly smooth and 
perfectly elastic spheres that move with velocities U and U’. We 


oa 


suppose that one of these is represented by the inner sphere of Fig. 48, 
and that the other collides with it, the point of contact being some- 
where in the element of area dd. At the instant when the collision 
takes place, the center of the second sphere must be located on 
the surface of the outer sphere of Fig. 48; that is, the centers must 
be separated by a distance equal to the diameter of the spheres. 
Moreover, the center of the second sphere must be somewhere 
within the element of area 4 d.4 which is homologous to the element 
dA upon the inner sphere. We ask, under these circumstances, 
for the velocities “7, 0, w and u’, v’, w’ which these spheres will have 
after collision. <2 

From the principle of the conservation of momentum we know 


§ 144. THE DYNAMICS OF COLLISION 391 


that what one sphere gains in momentum must be lost by the other. 
Hence we have 
“—-u=u — iy! = gy, 
v—-v =v —9' =g,, (217) 
w—-w=w —w' = gy, 


the g’s being introduced merely for simplicity of expression. 

Next, if the spheres are perfectly smooth, the acceleration must 
be in the direction of their line of centers O4 at the moment of 
collision. We call the direction cosines of this line of centers 
A, u,v. Hence we have 


ae =O = S, (218) 


the S being again a constant introduced for simplicity. 
Finally, energy must be conserved, so that 


wot wt a? + 07+ ow? =u? 407+ 0? + u'2 402+", (219) 
We now write the six equations (217) in the form 


u=tu + guy 


=i A / 
Wie — Les 


square them, add the results together, and then subtract (219) from 
them. ‘The result is : 


2(gu? +07 + 0") + 2[(u’ — u) gu + (Co! — 0) go + (w’ — w) gu), 


or, by using (218) and remembering that the sum of the squares of 
thé direction cosines is of necessity equal to unity, 


282 + 2S [\(u’ — uv) + wo’ — 0) + rp’ — w)J] = 0. 
It follows, then, that 5 must either be zero or take the value 
S=Mu-—u') + pu(o — 0’) + ow — w’). (220) 


Whichever of these values S may have, the substitution of 
Zu = XS, Zo = WS, Fo = vS in (217) gives the desired relations 


Dei KEL a wes, 
=v — pS, oo =v +48, (221) 
= w— vs, w =w' +8. 


_ 


to] 


§| 


392 PROBABILITY AND ITS ENGINEERING USES 


It is now obvious that the value S = 0 is not the true one, for it 
would require the velocities after contact to be the same as before. 

We have now derived the equations which define the new veloc- 
ities in terms of the old ones and of the direction cosines of the line 
of centers at the instant of collision. But we still want a geometrical 
meaning for the letter S. This we get by noting that the velocity of 
the sphere of class U relative to the one of class U’ has the velocity 
components uz — u’,v — o', w — w’. Its absolute value is therefore 
R=V(u —u’')? + (v — vo’)? + (w — w’)? and its direction cosines 
are the ratios of the three components to this absolute value R. 
If we denote these direction cosines by dr, ur, vr, and substitute them 
in (220) we find that 


S = [Ae + weer + vrelR. 


However, by an elementary theorem in analytic geometry, the 
combination of direction cosines which occurs in the brackets is equal 
to the cosine of the angle included between the directions in question; 
that is, between the line of centers at the moment of collision, 
and the direction of the relative velocity... If we call this angle @, 
(220) takes the simpler form 


S = Rcos 68. (222) 


In other words, the symbol S' represents the projection of the relative 
velocity upon the line of centers. 

We shall next show that, if we were to start with two spheres 
having velocities U and U’ (that is, the velocities with which the 
other pair emerged from their collision), and if we allowed them to 
collide in such a way that, at the instant of collision, the line of 
centers was the same as before, they would emerge with velocities 
U and U’ (that is, with velocities equal to those which the other pair 
had éefore collision). 

Of course, since the argument by means of which we obtained 
(221) is a perfectly general one, it follows that the velocity com- 
ponents of our new set of spheres, after collision; must be given by 
the equations 


#=U— dS; = +S, os, ~~ (223) 


in which symbols such as % represent the desired velocities after 
collision, and § bears the same relation to our new velocities that 
S did to the old ones. As the line of centers is the same as before, 
the direction cosines remain unchanged. 


The quantity § is the central element in our argument. By 


§ 145. THE FLUX ACROSS A SURFACE 393 


merely changing the symbols in (220) so as to apply to our present 
casc we find that it must satisfy the relation 


S=\4@—-—H)+u—-0) +» @—-®’. 


But from (221) we have w — w’ = u — uv’ — 2S, with entirely 
similar equations in the v’s and w’s. Making use of these we easily 
throw the equation for § into the form 


S=A(u—wW)+u—v')+yw—w) -2¥%4+ 4 PS, 
which is obviously equivalent to § =— S. 
If we now substitute this value of § in (223) and compare the 


resulting equations with (221), we find’ that U and U’ are indeed 
identical with YU and J’, which is what we set out to prove. 


§ 145. The Probable Flux Across a Surface 


Let us now consider a set of spheres moving about in the 
fashion in which the gas molecules are supposed to move in the 
Kinetic Theory. We shall suppose these spheres, or molecules, 
to be hard and smooth and elastic, just as we did in § 144, 
and we shall further agree to represent by the formula 
p(u, v, W, *, y, %,¢t) du dv dw dx dy dz the chance of a molecule 
lying within the parallelopiped bounded by the planes x, « + dx; 
y, ¥ + dy; z,%2-+ dz and having velocity components that lie 
between wu and u + du, v and v + dv, w + dw respectively. 
It is, of course, our purpose to find what this function is; 
but our way of arriving at the answer will be a somewhat 
indirect one, and for the moment the symbol itself is quite 
sufficient. It should be noted, however, that we are not. 
assuming that one element of volume is just as likely to contain 
a molecule as another, for we have made 7 a function of x, y, 2; 
nor are we assuming that the gas is in a state of statistical 
equilibrium, for we have made 7 a function of time. 

We start, then, with this symbol p(U, X, t) dU dX for the 
chance of there being a molecule of class U in the element dX 
at the time ¢, and ask for the probability of such a molecule 
passing out between the time ¢ and ¢ + df, dt being supposed 
infinitesimally small. We shall represent this probability by 


P(X) dU dX dt. 


394. PROBABILITY AND ITS ENGINEERING USES 


We refer to Fig. 49 and note that, if a sphere crosses the 
boundary of @X at all, it must cross one of the six faces. We 
shall therefore find the probability we desire if we consider 
each of these six faces separately. Let us take first the pair 

which are perpendicular to the x-direc- 
4% tion. 

A molecule of class U cannot leave 
this volume element across the left- 
hand face if @~is positive; if w is 
negative it cannot leave across the 
right-hand face. Hence the chance of 
the molecule leaving across one of these 

: faces in time df is just the chance of 
Fic. 49. it lying in some such slab as d/z if u 

is positive, or d/; if wis negative. We 

thus find that the chance of a molecule leaving during dt is just 


plu, v, w, x + dx, y, 2, t) du dv dw(u dt) dy dz 


if % is positive, and 


plu, v, W, x, y, %, t) du dv dw (— u dt) dy dz 


if uw is negative, the use of x + dx in the former of these expres- 
sions being dictated by the fact that the slab lies adjacent to 
the right-hand face of dX. 

Of course there are entirely similar expressions for the 
chances of leaving across either of the other pairs of faces, 
and by summing these expressions we might find P(X). The 
result would be a rather complicated one, however, as it would 
have different forms for each of the eight possible combinations 
of signs of u,v,w. Fortunately, we have no use for the 
formula in just this form, and need not carry the matter into 
greater detail. Instead, we shall turn our attention to the 
chance of entering dX in this same way, which we shall call 
P(X) dU dX dt. 

The argument for this case is just the same as was the 
argument for leaving. The only difference is that the slab 
within which the molecule must lie in order to cross within ds 


/ 


§ 146. CHANGE OF VELOCITY CLASS 395 


is now outside dX instead of inside, and is adjacent to the 
opposite face in every instance. It follows that the chance of 
entering across the pair of faces which we have been considering 
is 
P(t, 0, W, x, ¥, 2, t) du, dv, dw (u dt) dy dz 

if u 1s positive, and 

P(t, 0, w, x + dx, y, 2, t) du dv dw (— u dt) dy dz 
if u is negative. 

We shall not be much interested in either P(X) or P(X) 
directly, but we shall need to know the difference P(X) — P(X), 
which may readily be found by discussing one pair of faces 
at a time and adding the results. For the pair which we have 
had under consideration, we readily see that the difference is 
[p(u, 0, W, x, y, %, 4) — pu, v, w, x-+-dx, y, 2, t)|u du dv dw dy dz dt, 
no matter whether uw be positive or negative. As dx itself is 
supposed to be infinitesimally small, this may be rewritten 
in the form 


-2 gdlraXat: 


It is obvious that the analogous probabilities for the other 
pairs of faces are — Sy 2 aU aX at and — = zw dU dX dt, 


whence we obtain as our final result 
P(X) — PR) =— (x2 Ly Ptu 2), (24) 


the differential elements being dU dX dt on both sides of the 
Slueuon: 


§ 146. Change of Velocity Class 


Another thing in which we shall be interested in later sec- 
tions is the chance of a molecule changing from the velocity 
class U to some other velocity class during the time d/, and 
the analogous chance of it changing from some other velocity 
class to the class U. 


396 PROBABILITY AND ITS ENGINEERING USES 


The chance of a sphere of class U being within the element 
of volume dX at the time ¢ is p(U,X,A)dUdXdt. lf a 
sphere of class U’ comes along and collides with it during the 
time interval dt, this latter sphere must have been located 
somewhere within a well-defined volume element when the 
time interval dt began. Specifically, it must have been within 
that element which the area 4d/4 (Fig. 48) would sweep out, 
if it were to travel for the time df with a velocity equal to the 
velocity R of the U-class sphere relative to the U’-class sphere. 

It is not difficult to see that the volume of this element is 
just 4 R cos 6dA dt, where 6 is the angle between the relative 
velocity R and the line of centers at the instant of collision. 
From (222) we see at once that it may be written in the alterna- 
tive form 45 dd dt. 

Now the chance of a collision of the sort under consideration 
is just the product of the probability of a U-class sphere in dX, 
which we know, by the conditional probability that there is a 
U’-class sphere in the other volume element #f there is a U-class 
sphere in dX. Nhat this conditional probability is we do not 
know. Moreover, no way has ever been found to determine 
it without advance knowledge of the distribution function 
p(U, X, 4), which we obviously do not possess.!. Hence we 
seem to be blocked from further exact progress. If, however, 
we assume that the existence of a U’-class molecule in any 
element of volume is not in any way affected by the proximity 
of U-class molecules — in other words, if we assume the two 
events to be independent — we may make use of the uncon- 


1 Jeans, in his Dynamical Theory of Gases, gives a development of Maxwell’s Law 
which he believes to be free from this objection. The demonstration is phrased in 
terms of Statistical Mechanics, and while it does not need our assumption in the 
exact form in which we have made it, it appears to me that something very similar 
lurks in assuming a uniform density for the “ dust of points” in his statistical space 
of 6N dimensions. 

On the other hand, Jeans has shown that if the gas is very tenuous and is dis- 
tributed in accordance with Maxwell’s Law, the chance of a U’-class molecule in any 
element is independent of the situation in neighboring elements; which seems to 
complete the argument in favor of Maxwell’s Law a posteriori for suchrare gases. 
For gases in which the molecules occupy an appreciable portion of the available space 
he has shown that the condition of independence does not exist. e 


§ 146. CHANGE OF VELOCITY CLASS 397 


ditional probability of a U’-class molecule in the element 
4S dd dt and arrive at a result; for that unconditional prob- 
ability is just p(U’, X%, 2) dU’ 4S dA dt, where X’ denotes 
some point located in the element 4 Sd4dt. Therefore the 
chance of a collision occurring between two such molecules is 


4S p(U, X, 1) p(U', X', 1) dUdX dU' dA dt. (225) 


This is the chance of a collision of a special kind. We 
must not forget, however, that we are seeking for the chance 
of a molecule leaving the class U, and this result will follow 
from any collision whatever.1_ Hence it is necessary for us to 
sum the probability (225) over every class of molecule with 
which our U-class molecule could possibly collide. The result 
is 


P(U) dU dX di=4dU dX at { dU” { dd S p(U, X, 2) p(U’", X’, 2). 
(226) 


This is the chance of a molecule /eaving class U by collision. 
As for the chance of a molecule entering class U, that can best 
be obtained by indirection. We saw in §144 that if the 
colliding molecules had velocities U and U’, and collided so 
that their line of centers had the appropriate direction, they 
would give rise to two new velocities U and U’. To get the 
chance of a molecule entering class U in just this way, there- 
fore, we need only multiply together the probabilities of their 
having had, before collision, velocities which lay within the 
ranges to which collisions between U- and U’-class molecules 
might lead. Let us denote these ranges by dU and dU’. 
Then we have at once, for the probability of a collision of 
exactly the type about which we have been speaking, 


48 p(U, X, 2) p(U’, X’, ) dU dX dU' dA dt; 


1So long as the differential element dU is of finite magnitude it is possible for 
the molecule to undergo a collision which changes its velocity by so little that it still 
remains in the same class. The chance of this vanishes with dU, however, and does 
not invalidate our argument, 


398 PROBABILITY AND ITS ENGINEERING USES 


and therefore, for the total probability of entering class U 
through any sort of collision whatever, 


P(U) AU dX dt=4dU dX dt { 40" f da § aU, X, 2pel Xa 
(227) 


We have seen in § 144, however, that S = — S; and we have 
seen in § 68 that the Jacobian of the transiormaeon (227) as 
equal to unity, which means that dUdU’ = dUdU'. We 
conclude at once, therefore, that (227) may be written in the 
form ! 


P(U) dU dX dt=4dU dX dt au’ { dA p(U, Xs) pO" 1 eae 
(228) 


Again, as in § 145, we shall be interested less in P(U) and 
P(U) separately than in their difference. We shall therefore 
subtract (228) from (226) and write 


PU) ~ PCO) = 4 [ du" { a 8 (pp — pp’). (229) 


Strictly speaking, the symbols p and 7 in (229) refer to the 
chance of a molecule being within an element of volume 
located at the point X, while p’ and p’ refer to the probabilities 
at points situated a distance from X equal to the diameter of a 
molecule. This is obvious from Fig. 48. But the molecules 
are so small that p(U’,.X’,t) and p(U’, X’, 4) will generally 
differ by negligible amounts from p(U’, X, ¢) and p(U’, X, 2). 
In what follows, therefore, we shall assume that all the symbols 
in (229) refer to the point X. 


1 We have dropped the sign of S for just the same reason as we ignore the sign of 
the Jacobian. It is the result of certain conventions as to directions and is more 
easily corrected in this arbitrary fashion than kept correct. In the present instance 
we know that all quantities must of necessity be positive. 7 


§147. THE FUNDAMENTAL GAS EQUATION 399 


§ 147. The Fundamental hae of the Kinetic Theory of 
Gases 


We are now prepared to derive the integro-differential 
equation which forms the mathematical basis for the entire 
Kinetic Theory of Gases. We suppose that an element of 
volume @X, is taken under observation at the instant ¢, and 
that the observation is extended to the time ¢ + dt. We ask 
for the probability that, at the evd of this interval, the element 
contains a molecule moving with the velocity U. In other 
words, we ask for the probability p(U, X, t + df). 

This probability, however, can be expressed as the sum of 
five! related probabilities all of which have already been 
found. They are: 


1. The probability, that a molecule of velocity class U was 
in dX at the time ¢.?_ This is, of course, p(U, X, ¢). 


2. The chance that such a molecule was in dX at time ¢ but 
wandered across the boundary during dt. We have already 


denoted this by the symbol P(X) dU dX dt. 


3. The chance that such a molecule was in dX at time ¢, but 
suffered a collision which caused it to alter its velocity. We 
have denoted this by P(U) dU dX dt. 


1If the molecules are subject to extraneous forces, such as gravitational forces, 
for example, it is possible for a molecule to enter or leave class U by acceleration. 
If we were to take account of these possibilities we would have seven items, instead 
of five, to consider. The result would be to add to the left-hand side of (230) three 
additional terms 
Op Op ap 
Fy =a Fy Aeris 
Ou Oo Qw’ 
Fz, Fy and F; being the components of the applied force in the three coordinate direc- 
tions. This would lead to a certain amount of algebraic complication and would 
cause us to reach a different distribution function p: the gas i1 a large tank is denser 
at the bottom and rarer at the top than it would be if the earth exerted no gravitational 
attraction, for example. But it would lead to no new ideas of a statistical nature 
and is not in place here. We assume, therefore, that our gas is not under the influence 
of such extraneous forces. 
2To be strictly accurate we should add “and remained there.” But if df is 
infinitesimal, the chance of it ot remaining there is also Pee, and may be 


ignored. 
. 


400 PROBABILITY AND ITS ENGINEERING USES 


4. The chance that a molecule of velocity class U was near 
enough dX at the time ¢ to wander info it during dt. We have 
denoted this by P(X) dU dX dt. 


s. The chance that a molecule of velocity class other than U 
was in dX at time ¢, and suffered a collision during dt which 
caused it to enter the class U. We have denoted this prob- 
ability by P(U) dU dX dt. 

Adding together these five probabilities, as given by (229) 
and (224), and observing that 


p(U, X, t+ dt) — p(U, X, 4) = 2b 


we atrive at the formula 
OP OP SE ACP hm ere ' ar pat 
eve inetd ar au’ fda Sp pp'l. (230) 


This is the fundamental equation of the Kinetic Theory. It is 
an integro-differential equation, among the solutions of which 
must be found the distribution function p(U, X, ¢) for which 
we are seeking; that is, the distribution of velocities in a gas 
which is in statistical equilibrium. But it is much more 
general than the result which we shall obtain from it; for as 
we noted at the beginning; we have throughout taken account 
of the possibility of time variation. If then, we were to put 
a gas in some state other than equilibrium and then leave it 
to itself, the distribution function which governed its return 
to equilibrium would have to be given, instant by instant, by 
some one of the solutions of (230). From this equation, then, 
we should be able to learn many things about such readjust- 
ments within a gas — the time required to carry them out, for 
instance; and it has actually proved very useful in the treat- 
ment of such problems, which are, however, too technical 
for an elementary text in the Theory of Probability. 


§ 148. The H-Function 


So far, we have spoken only of the chance of an element of 
volume possessing a molecule of a certain kind..We now 
desire, for a time, to talk of the chance of a molecule being in 


§ 148. THE H-FUNCTION 401 


a certain state at a certain time. For this purpose, let us sup- 
pose that there are within the container N molecules and that 
we have some means of identifying them from one another. 
Further, we assume that, if there is a molecule of class U in 
dX at time /, it is just as likely to be any one of the set as any 
other. 

We denote by p*(U, X,#) the chance of our particular 
molecule being in such a place and state at time ¢. Then 
obviously 


I 
P*(U,X,0) = 2 pW, X 0). (931) 
It is, of course, the same for every molecule. 
Let us now compute the expectation of the logarithm of 


this probability, after the fashion explained in § 80. Formally, 
at least, the result is 


(log p*) = {au aX p*(U, X, 1) log p*(U, X, 0). 
Using (231), and remembering that the integral of p*(U, X42) 


over all possible places and velocities must of necessity equal 
unity, we can easily reduce this to the form } 


H(t) = N a(logp*) =— NlogN + au { aX p log pi 4232) 
Had we chosen a time ¢ + df instead of a time ¢ we would 
have gotten an entirely analogous result, except that in every 


instance the value of ¢ would have been different. It follows, 
then, that H() must satisfy the differential equation 


dH ap ee 
= i dU f aX = (log p + 1), (233) 


which is obtained by straightforward differentiation of (232). 


1 The exact way in which the H is defined, and indeed the use of the expectation of 
the Jogarithm of the probability at all, is dictated by classical usage in the Kinetic 
Theory of Gases. The interpretation which we give to it, however, is not the usual 
one. The latter will be found in any good treatise on the Kinetic Theory. 


402 PROBABILITY AND ITS ENGINEERING USES 


We have already obtained an expression for op/ot in 
§ 147, and upon substituting this in (233) we have 


a sf av fax au'faas (pP’ — pp’)(log p + 1) 


BP GE ae 
- fav { ax(w eat v ah 22) logp + 1) haa) 


The last term of this equation is readily thrown into the 


form 
- fav fax(u2 — + oe +w 2\op log p), 


which is known by Green’s Theorem to be just the surface 
integral, over the entire container of the gas and for all pos- 
sible values of the velocity, of the product of the normal 
component of U by the quantity plogp. The product 
p log p, however, has, at the surface of the container, the same 
value for a negative normal component as for a positive one; 
for if the walls of the container are not in motion every molecule 
which strikes leaves with its normal velocity reversed. Hence 
the entire expression is an odd function of this normal com- 
ponent, and in summing over every possible value of U it 
occurs as often with a positive as with a negative sign. The 
entire integral is therefore zero. 

We conclude, then, that dH/dt is represented by the first 
term of (234) only. 

Finally, we notice that, had we chosen ft the very 
beginning to speak of our molecule as belonging to class U’ 
instead of U, we would have obtained an entirely similar 
expression, the only difference occurring in the substitution 
of log p’ + 1 in place of logp +1. If we had chosen to 


speak of a molecule of class U we would have gotten 


BHO S Aes e/a be z 
= {a ax { dU" { a45 (pp’ — Pp’) (log P+ 1). 


§ 148. THE H-FUNCTION 403 


But since SdUdU' = S dU dU’, as we saw in §146, this 
becomes 


dH == 
soyieir sf auf axf av’ {aas op — pp’) (log p + 1). 


There is also a similar expression which might have been 
derived had we chosen to speak of a molecule being of class 0’. 
It differs only in the replacement of log + 1 by log p’ + 1. 
Adding the four expressions for dH /dt thus obtained we reach 
the more symmetrical result 


dH 7 ! y oer) ’ ! Prd) 
G7 J {ax { au’ { dA 8 (PP'—pp') (0g pp’ —log 57) 
(235) 

We may now notice this fact: When pp’ is greater than 
pp’, the logarithm of the former exceeds the logarithm of the 
latter, and conversely. The two factors in the integrand of 
(235) are therefore always opposite in sign (unless they are 
zero), whence, since S is positive by definition, the entire 
integral must be negative. The only possible exception occurs 
when the expression pp’ — pp’ is identically zero. 


d 
We conclude that ee and therefore ei(log p*) also, 


is never positive. In other words, 77 any dynamical system of 
“molecules” of the sort under discussion, the expectation of 
log p*(U, X, isa decreasing, or at least not an increasing, func- 
tion of the time. 

Now we have seen in § 80 that the expectation of log p 
is least when the distribution of the variable is completely 
random. The fact that the e:(log p*) is continually decreasing 
‘ therefore’ indicates a tendency of the gas, when left to itself, 
to approach more and more nearly to a condition of absolute 
randomness. We might, indeed, define H as measuring the 
“departure from randomness.” Then when H reached its 
minimum value the gas would be as near to a completely ran- 
dom distribution as the dynamical conditions by which it is 
surrounded will permit. 


404 PROBABILITY AND ITS ENGINEERING USES 


§ 149. Maxwell’s Law of Velocities 


Let us now find what distribution of positions and velocities 
is as nearly random as the dynamical conditions of the problem 
will permit. The only essential dynamical conditions of 
which we know are, that the number of molecules must always 
be the same, and that the total energy shall not vary. Cer- 
tainly if the number of molecules does not change, the expecta- 
tion of that number must have the fixed value N, whence we 


get 
N= fav {ax p. (236) 


And if the total energy is a constant, the expectation of the 
total energy must be equal to the same constant. It is usual 
in physics to denote this constant by $kNT, N being the 
number of molecules, T the temperature and k a constant 
determined by the mechanical equivalent of heat. If we use 
this same notation, our function p must satisfy the relation 


w =4inT =" aU f aX (u? + v + w) p, (237) 


m being the mass of the molecule. 
We are now required to make 


H = {du { dX plogp 


a minimum subject to the conditions that N and W shall 
remain unchanged. Following out the process explained. in 
§ 80, we find that it is necessary to make the integral 


fa fax pllogp —»— u(t +0? + w) 


as small as possible, without restrictions. If, however, we 
replace p by p + 6 and require that the integral which contains __ 


§ ‘t49. MAXWELL’S LAW OF VELOCITIES 405 


the first power of 6 shall vanish identically, we get the solu- 
tion 


logp => — 1+ ww? + o? + w?), (238) 
or 
P = Cee! (239) 


There remains the matter of determining the arbitrary 
constants C and yw. For this purpose we have the two rela- 
tions (236) and (237), both of which must be satisfied by our 
function. Substituting (239) in (236), we observe at once that 
the integrations with respect to u, v, w will give infinite results 
unless wis negative. But if « is negative each of the three leads 
to a factor W/—7/u. This causes (236) to take the form 


pes ae =) fax. 


As the integral of dX over all possible values is just the volume 
of the container, we get 


N a\ it 

Seely cet 

V m 
as our first condition. 


By substituting (239) in (237) and carrying out an entirely 
similar set of integrations we arrive at another relation 
N m nr 
oe ele eG ey, 

Us ue i) 

If we solve these equations and set the ratio N/V, which is 
the number of molecules per unit volume, equal to », we 
get the results 


ee ee 
AN Ez 


3 3 
m a 
on kT. T 
Let us now assure ourselves that this solution really satisfies 
(230), for we must remember that we have obtained it, not by 


=A, 


(240) 


406 PROBABILITY AND ITS ENGINEERING USES 


solving (230) itself, but by making H a minimum. By sub- 
stituting (238) in (219) we obtain 


log p + log p’ = log p + log 7’, 
or what amounts to the same thing 
PP’ = PP’. 
Hence the right-hand side of (230) vanishes. That the left- 
hand side also vanishes is obvious at once;-since p is a function 
of neither x, y, 2 nor ¢. 

Finally, let us compare our solution with (73). We remem- 
ber, to begin with, that (73) is a formula for the probability 
of a molecule being in a certain state, as is obvious from the 
use of the asterisk. Moreover, it represents the chance of 
the molecule having the velocity U, xo matter where it may be, 
whereas the p*(U, X, ¢) of (231) is the probability of a mole- 
cule being ina particular place and having a particular velocity. 
The relation between the two is obviously 


pP(ne,1) = (p"U,X,) aX = 5 pW, X, 0). 
Hence we get, from (239) and (240), 


36 om 
m ee Cet tae) 
p*, 0, W) = ee Cees ’ (241) 


which is indeed identical with (73) provided we make 
a = m/2kT, as we have done in writing (240). 


§$ 150. Pressure 


When a molecule collides with the wall of the containing 
vessel, its component of momentum is reversed. To do this 
requires the exertion of a force upon the molecule, and of 
course imposes an equal force upon the wall. Theoretically, 
since we are assuming that both the molecules and the walls 
are perfectly hard, the change of momentum takes place 
instantaneously, and therefore the force necessary to bring it 
about must be infinite. In other words, at those instants 


i. -, 
i 


| 
: 
: 
L 


§ 150. PRESSURE 407 
when molecules collide with the walls of the vessel, it is sub- 
jected to an infinite forge; at all other instants it is subject to 
no force whatever. 

We do not ordinarily think of the pressure of a gas as being 
of this nature: we think of a toy balloon, for instance, as 
being stretched by the steady exertion of an unvarying force 
upon every element of its surface. But in the Kinetic Theory 
the pressure is supposed to be the result of all these collisions, 
and to appear steady only because we are incapable of observ- 
ing the individual impulses. In other words, in the Kinetic 
Theory “pressure” is the expectation of the force per unit 
area of the surface. 

We may arrive at our result as follows: If a variable force 
acts for a time T, its average value is, by definition 


Fad (rae 


Suppose, now, that such a force acts in the direction of the 
x-axis upon a body of mass m. Then the relation between 
force and acceleration shows us that 


du 
fz ae hy? 


from which we readily find that 
be T=muy— Mt, 


uo and wr being the velocities of the particle at the beginning 
and end of the interval. 

Stated in words this equation reads, “‘ the average of a 
variable force is equal to the change of momentum produced 
in time 7, divided by 7.” The exact analogue of this, ex- 
pressed in terms of expectation instead of average is: 

“The pressure exerted upon the walls of a vessel containing 
a gas is equal to the expectation of the change of momentum 
per unit area per unit time.” 

As the pressure is the same in all directions, we may as 
well consider an element of area perpendicular to the axis of «. 


408 PROBABILITY AND ITS ENGINEERING USES 


The probability of a molecule striking such an element with a 
velocity U in time df is obviously 


p(U, X,t) udA dt, 


dd being the area of the element, and z being allowed to have 
only positive values since there is gas on only one side of the 
wall. Each such molecule undergoes a change of momentum 
equal to 2mu. Hence the product of this change of momentum 
by its probability of occurrence is 2mu? p(U, X, ¢) dd dt; and 
summing this over every possible value of U (that is, over 
every value for which z is positive), we get 


=) dad{- auf of dw u? e Saree 


which works out to be P = vkT. 
If we replace » by N/V this becomes the well-known law of 
perfect gases 


PdAdt= am(—™ 


PV = kNT: 


the product of pressure by volume is proportional to tem- 
perature. 


$151. The Expectation of the Distance Travelled by a Molecule 


From the probability (76) of a molecule having a given 
speed we can readily find the expectation of the distance 
travelled in time dt. For if the molecule is travelling with 
speed s it will travel a distance s d¢ during such an interval. 
The expectation of this distance is therefore 


‘ 3 a f 
sae BY ee? eas 
WV ra 


As the expectation of the distance is proportional to the time, 
no matter how long or how short the interval may be, it is 
not necessary to regard d¢ as an infinitesimal. We may 
therefore say: In unit time a molecule may be expected to 
travel a distance 


Vara 7m ee 


§ 153. MEAN FREE PATH 409 


§ 152. Number of Collisions 


We have found in-§146 that the chance of a collision 
between a U-class molecule and one of some other kind within 
the element dX and during the time dé, is given by (226). 
The expectation of the total number of collisions in the entire 
container during df is therefore the integral of this expression 
over all possible values of U and X. But if we compute the 
integral in this way we really count each collision twice, for 
if we think of two special values U; and U2 we count not only 
U= U; and U'= Us, but also U= U2 and VU’ = Ui. We 
must therefore take Aa/f the integral in order to get the correct 
expectation. That is 


xe 2 dt { dU ax { av’ {da S o(U, X, #) p'(U’, X’, 0). 


The evaluation of this integral is a rather complicated 
piece of calculus, and it will be sufficient for us to state only 
the final result, which is 

8 v2 p2V RT’ (243) 
m 
where p is the radius of a molecule. 

This, of course, is the total number of collisions that may be 
expected to take place in the entire volume Y. To find the 
number in which any one molecule may be expected to par- 
take, we may notice that if each collision involved only one 
molecule this would be just 1/N times (243). But since each 
collision involves ‘wo molecules, the true answer is twice as 
great as this. That is, each molecule can be expected to take 
part in eer 
kT 

m 


16 v p? (244) 


collisions per unit time. 


§ 153. Expected (“ Mean”’) Free Path 


We next ask how far a molecule may be expected to travel 
between collisions. Having already found the distance it may 


410 PROBABILITY AND ITS ENGINEERING USES 


be expected to travel per unit time, and the number of collisions 
it may be expected to have within that distance, the expected 
free path can be very easily obtained by dividing (242) by 
(244). The result is 

V2 


Arvp2 


This is as far as we are justified in discussing the Kinetic 
Theory in an elementary text, and indeed we have introduced 
the major part of the mathematical ideas which underlie it. 
Beyond this point it becomes largely an application of these 
ideas to the discussion of various physical systems. We leave 
the Kinetic Theory, therefore, for a few simple problems of a 
slightly different type. 


§ 154. Density Fluctuations 


Problems of a sort related to those of the Kinetic Theory 
appear in various branches of science. One of the simpler 
ones is concerned with what may be briefly characterized as 
“density fluctuations.” It appears in many places. For 
example: 

Gas molecules are continually wandering into and out of 
any element of volume which we may choose to take under 
observation. The number in it therefore varies with the time, 
or, in other words, the density of the gas fluctuates from point 
to point and from time to time. 

If we blow some fine dust particles into a gas-filled vessel, 
they will wander about in a haphazard fashion because of the 
impacts which they receive from the molecules of the gas. 
The number of dust particles within our element of observation 
will also vary with the time. That is, the dust “ density ” 
will vary. , 

If we observe a thin film of liquid under the microscope we 
may see colloidal particles or living organisms wandering about 
in it. If so, the number within any marked portion of the 
field of vision fluctuate with the time. E 

Exactly this same phenomenon appears in so many ways | - 


S154. DENSITY FLUCTUATIONS 4II 


that\it is natural to ask as to the nature of the fluctuations. 
Specifically, therefore, we shall seek to find the probability 
that a random observation will show exactly 7 particles within 
the observed.element of volume. 

To solve this problem, let us introduce the symbol pi at 
to represent the chance of a particle entering the element during 
the interval dt, and p, dt to represent the probability of one 
leaving, the subscript 7 indicating the presence of / particles 
at the beginning of theinterval. Then if P(j, 2) represents the 
probability of just 7 particles at the instant ¢, it is a simple 
matter to set up the equations 


Pot + dt) = Po, = pod) +P, A pi de, 

PGet at) = PG = 1) padi t+ PY, OG = pdt — 7p; db) 
si ia] or I, 1) Pj41 at; 

POs? +. dt) =!P (S21, 0) paid + PO YG — Pp, dd); 

the last of these being written on the assumption that only 

particles are available, so that the chance of a larger number 

in the element is zero. If the number may be regarded as 

infinite, the last equation need not be considered. 


We can readily reduce these to the form of differential 
equations and obtain 


nee —— pelo) A fi £11) 

se) = pj1P(j—1) — (pit dD) PY) +P PG HDs F (245) 
_ aP(d) _ es <2 
iy ee Pratl ih 1) Py h)- 


It is no longer necessary to write the time explicitly, since we 
shall have no further occasion to refer to the instant ¢ + df. 
If we now assume that the system is in statistical equilib- 


412 PROBABILITY AND ITS ENGINEERING USES 


rium, the derivatives vanish from (245), leaving a set of linear 
forms the solution of which can ah be found. It 1s, 


PUj) = srt, (246) 


where 
Popi ---Ps-1 
Di pean sD} 
and the summation is to be extended to every possible value of /. 

This, then, is a perfectly general formula that applies to all 
such problems, no matter what the shape of the volume element 
may be, and no matter whether the entrances and exits are 
influenced by the presence of other particles or not; for we have 
introduced no geometrical ideas into our discussion which 
would cause the shape of the volume element to invalidate it, 
and we have taken account of the fact that p and p might be 
different for different values of 7. 

Let us now pass to the more specific case of a gas, the 
molecules of which obey Maxwell’s Law. We shall assume 
that we are dealing with an element of perfectly arbitrary 
shape, the volume of which is Y and the superficial area 7. 
A molecule that was within this element at the time ¢ will have 
passed out at the time ¢+ d¢ if it lay within a distance u dt 
of the surface, and possessed a velocity component uw normal 
to it. We cannot specify the direction of the normal exactly, 
of course, for the element is supposed to be of purely arbitrary 
shape. But this is of no consequence, since we know that the 
chance of a molecule having a velocity component uw in a 
given direction is the same for all directions. [It is, in fact, 
given by (77). As for the chance of it lying within the neces- 
sary distance of the surface, that is just the ratio of the volume 
of the shell within which it must lie, to the total volume V. 
That is, it is Zudt/V. We have, then, for the chance of a 
molecule passing over the boundary during the interval dt, 
provided it is within the element when that interval begins, 
the formula 


Ay = 


A dt [ Adt ‘ 
—— | up*(u) du = ———. ; 
V 0 ‘ \ ) WV ax 


§ 154. DENSITY FLUCTUATIONS 413 


Of course, if there are j molecules in the element to begin 
with, the chance of one passing out during df is just 7 times as 
great. Hence we have’ © 

a dt 
ar 2 
VV ax \ 47) 


As for the chance of a molecule passing into V, to do that 
it must lie within a distance u of the boundary on the outside 
if it has a normal velocity uw. But by (239) and (240) the 
chance of a molecule of velocity U within such an element of 
volume is 


pat = 


a _ 2 2 2 
pie ieee OS ea. 


Tv 


wherefore, upon integrating over every possible value of v 
and w (that is, of the tangential components) and over those 
values of w which are directed toward our element instead of 
away from it, we have 


a % = * 3 —a(ut+or+w%) __ va dt 
put = , A ar(“) f duf dof dwe =". (248) 


If we now introduce these results into (246), and note that 
vV is just the expected number of particles in volume /, we 
get at once 


(V)i 
ieee 
For the denominator we have, if \ is regarded as infinite,! 
of if 
epee 
whence : 
: pas 
aah = 7 > 


which is just the familiar Poisson Law. 


1 We have already tacitly assumed the number of molecules to be infinite in deriving 
the formula for p. For if there are only \ molecules, where } is at all comparable with 
j, the presence of j of them within the element V influences the chance of another 
being within the shell from which new entrants come. The reader will have no dif- 
ficulty in revising the formula to deal with such a case. 


414 PROBABILITY AND ITS ENGINEERING USES 


It may be of interest to make use of this result to indicate 
how large the density fluctuations are within a small portion 
of gas. .The density of the gas is, of course, proportional to 
the number of molecules 7. If an element contains 7 = « + 6 
molecules, therefore, instead of the expected e¢, its density is 
just j/e = 1 + 6/e times the expected value. 

We consider a cubical element of volume, 0.01 cm. on a side, 
within a gas at room temperature and atmospheric pressure. 
It is known that in such a gas the number of molecules per 
cubic centimeter is about 2.5-10!%. This is our value of ». 
As V = 10-8 it follows that the expected number of molecules 
within this small volume is « = 2.5-10'%. How large, now, 
have we reason to expect the fluctuations in this number to be? 

In our discussion of the fit of statistical data we have 
learned to regard the standard deviation as an indication of 
the spread of our distribution function, and we find from 
Appendix X that the standard deviation of the Poisson Formula 
is just Ve. In our illustration this works out to be 5,000,000. 
Values of 6 of the order of magnitude of 5,000,000 may there- 
fore be regarded as quite usual. But a deviation from expecta- 
tion of this magnitude in the number of molecules contained 
in the element leads to a density only 1.0000002 times its nor- 
mal -value.! So in volume elements of the size under con- 
sideration the density fluctuations are very slight indeed. 

If, however, we were to consider a cube the dimensions of 
which were comparable to a wave-length of light we would 
find that the density fluctuations were appreciable, and if we 
were to deal with a fluid in which colloidal particles were sus- 
pended there might be considerable fluctuations in even larger 
elements. Such fluctuations are believed to be the cause of the 
optical phenomenon known as opalescence. 


§ 155. The Rapidity of Density Fluctuations 


It may appear strange that the results of the last paragraph | 
are entirely independent of the shape of the volume element 


~ 


Or to the reciprocal of this value if 5 is negative. 


S iss. DENSITY FLUCTUATIONS 415 


which we have under discussion, since the probability of a 
molecule either entering or leaving the element is proportional 
to the superficial area.“ Our result says, in fact, that if we 
were to immerse a cubical box, with only a pinhole in its walls, 
within a large flask of gas, large deviations from expectation 
would. be just as likely as if the element were bounded only 
by imaginary walls through which the molecules could pass 
with absolute freedom. There must, however, be some differ- 
ence between the two cases, and it is the purpose of the present 
section to discover what it is. 

We return to the general differential equation (245) and 
insert the values of p and p obtained in (247) and (248). 
Writing 1/a in place of »4/2V ar, so that p) = Ife and 
pi = j/ae, we have 


cae =*P(j-1) =ealk +4) Pj) + +L51 pj +1). 


This is the differential equation which oyr system must 
satisfy during the period when it is returning to normal after 
its statistical equilibrium has been somehow upset. We 
ought, therefore, to be able to learn, from it something about 
the speed with which such a return to normal takes piace. 

The constant a occurs as a divisor of the entire right-hand 
side of this equation; so if we were to introduce a new unit 
of time defined by the relation 


b= az 


we would arrive at a differential equation (and therefore also 
at a solution) which was entirely independent of a. In other 
words, if we are dealing with two elements of volume, within 
each of which the expected number of molecules is ¢ but for 
which the areas are different, so that the value of « is not the 
same for both, and if in speaking of what goes on in the first 
element we measure time in terms of a unit which is a1 seconds 
in length, while for the second element we use a unit which is 


416 PROBABILITY AND ITS ENGINEERING USES 


a2 units long, then in terms of these units the one system will 
recover from an abnormal condition in just the same time as 
the other. But of course this means that the one which has 
the larger unit of time will actually take a larger number of 
seconds, in proportion to the relative magnitudes of the two 
a’s. Remembering the definition of a, we see that this is 
equivalent to the statement: 


Though the MAGNITUDE of Statistical density fluctuations 1s 
dependent only upon the expected density and is influenced 
neither by the shape of the volume element nor by the area of the 
surface across which migrations can take place, the RAPIDITY 
of these fluctuations varies inversely as the area of this bounding 
surface. 


It is the rapidity of the fluctuation, then, and not its 
magnitude, which is dependent upon the shape of the boundary. 
If we return to our specific illustration of the cubical elements 
of gas, one bounded by a physical container perforated by a 
pinhole and the other bounded only by a mathematical sur- 
face, and suppose that at some instant both are completely 
empty, it is evident that the mathematical one would fill up 
much more rapidly than the other because its effective area is 
so much greater. This is quite in agreement with our common 
sense. But if we were to make many random observations 
upon each of them, after they had reached the condition of 
statistical equilibrium, we could expect the same proportion 
of those observations to show a given deviation from the 
expected density in each case. 

We still do not have a definite measure of the number of 
seconds required to return to equilibrium, and indeed the 
attempt to obtain such a definite measure is a difficult one. 
There are two reasons for this. In the first place, the return 
to equilibrium is asymptotic in character, and it is not a simple 
matter to specify what the “end” of the process shall be. 
In the second place, an exact answer to the question would 
require the solution of the set of equations (245), which is by 
no means simple. 


§ 156. THE SCHOTTKY EFFECT 414 


We can, however, get some insight into the matter by noting 
that a solution of (245) can be obtained in the form 


JI p—-é 


PG) = rte tae be Ons.) 


The complete solution will, of course, consist of as many 
equations of this sort as there are values of j, and the c’s 
will be different for each of them. Moreover, these c’s will 
depend upon the nature of the statistical upset from which 
the system is recovering. But the nature of the exponential 
terms will always be the same, no matter what value of 7 we 
may consider, nor what the statistical abnormality may have 
been. Hence the terms which contain the c’s all die out expo- 
nentially in such a way as to reach 1/e times their original 
value in 1/a seconds or less, and 1 per cent of their original 
value in a little less than five times as long. Therefore, 
unless the statistical upset has been of such a nature as to 
make the c’s inordinately large, of which we can only be sure 
by actually solving the equations, the time of recovery may 
very properly be said to be a small multiple of a. This is as 
far as we can go without extensive computation. 


§ 156. The Schottky Effect 


In a vacuum tube the current which passes from the 
filament to the plate is a migration of discrete particles which 
possesses regularity only in a statistical sense. Either the 
discrete nature of the particles or the inequality in the time 
intervals between them would be sufficient to affect any 
receiving circuit that could distinguish between steady and 
variable currents. The magnitude of the effect might con- 
ceivably be very small, but if it were sufficiently amplified it 
could be made to actuate a telephone receiver. It would then 
manifest itself as an audible noise. 

If some other fluctuation, such as an attenuated telephone 
message, were superposed on the original space current, it 
would be amplified to exactly the same degree as the statistical 
fluctuations and would likewise result in an audible signal. 


. 


418 PROBABILITY AND ITS ENGINEERING USES 


The intelligibility of the message thus received would then 
depend upon the ratio of the signal amplitude to the amplitude 
of the statistical variations. If this ratio were too small, the 
noise would mask the signal and the result would be worthless 
for communication purposes. In other words, even with 
perfect amplifiers, there is a lower limit of intensity below 
which electrical signals cannot be detected by vacuum tubes 
because of the lumpiness of the space current in the first tube. 

As a final illustration of fluctuation. phenomena we shall 
consider this problem, with a view to finding how much more 
electrical energy would be dissipated in a measuring circuit 
because of these’ statistical fluctuations than would be the 
case if the stream were perfectly steady. 

We shall assume: 


(a) That the system obeys the “law of superposition,” 
which means only that the current which flows in response to 
the sum of two electromotive forces simultaneously applied is 
the sum of the currents which would result if each force were 
applied separately; 


ce 


(4) That the emitting and measuring systems are in “ sta: 


tistical equilibrium ”’; 
(c) That the instants at which electrons emerge are dis- 
tributed individually and collectively at random; 


(d) That the magnitude of the Schottky effect is defined to 
be the difference between the expectation of the power in the 
measuring device, and the power that would be dissipated if all 
the irregularities of the electron stream were smoothed out. 


Consider a time interval T so long that the cumulative 
current and electromotive force due to all those electrons 
which were emitted before the interval began have practically 
vanished by the time it ends.! At the instant when this 
interval ends there is a power EJ in the measuring device. 


1Tt may be noted in passing that (mathematically) non-dissipative measuring 
devices are ruled out by this condition. Such devices are not used by the physicist, 
however. Even the idealized non-dissipative circuits which he deals with in theoretical 
studies are the limits toward which dissipative circuits approach as the dissipation is 
caused to vanish. Such circuits can be treated by the methods here used. 


§ 156. THE SCHOTTKY EFFECT 419 


Its value is unknown, of course, but we shall find that we can 
compute its expectation. 

We denote by p() the chance of just n electrons passing 
during this interval T. Due to assumption (c) we can write a 
formula for it at once as 

— a n 
p(n) = ae 


n! ? 


where » is the rate of emission. We also denote by P,(ETI) 
the conditional probability of a power exactly equal to EJ 
at the end of the interval T, if just 7 electrons were emitted. 
In terms of these symbols the expected power is obviously 


(EI) = D EI p(n) P,(ED, 


the summation being extended to every possible value of 7 
and EI. Since — and J are continuous variables and 7 is a 
discontinuous one, this requires the use of both integration 
and algebraic summation, and takes the form 


ia > p(n) ap “dE { CninpEie Soo) 


We shall first consider the integral terms of this expression, 
taken alone. As they form a sort of “conditional expectation ”’ 
we shall represent them by the symbol ! ¢,(E/). 

We now note that, if we begin with a group of s points, 
placed at random upon a line segment, and to them add 
s’ more, also placed at random, the result is a random group 
of s+ s’ points. In other words, the superposition of two 
random groups upon a line produces nothing else than a larger 
random group. In particular, we might start with a group 
of 2 — 1 and add to it another group of 1, both groups being 
placed at random. The result would then ee a random group 
of x. Conversely, we may regard any group of 7 as having 
been formed in this way. 

Let us think, then, of the 2 electrons which have been 


1This can be done without danger of confusion with an mth expectation, since 
we have in this problem no need for expectations higher than the first. 


420 PROBABILITY AND ITS ENGINEERING’ USES 


emitted within an interval T as constituting two groups, one 
nm — 1 in number and the other consisting of a single electron. 
In addition, let us denote the electromotive forces and currents 
due to these sets separately by £1, J: and Es, Jz, and their 
respective probabilities by P,_,(£1,/1) and Pi(E2,/2). By 
assumption (a) the aggregate current and electromotive force 
due to the superposition of the two are J; + [2 and £i + Fa, 
wherefore the instantaneous power must be (£1 + E2)(/1 + J2). 
Using these values we obtain at once the formula 


e,(E/) = { aE: { aBs{ a uf dTolEaks + Fol, + Eile 
++ Helel'P,_ (Eu 11) Pil Beste), 


the limits of integration being, in each case, from — o to +. 

Let us now inspect the four terms of this integral separately. 
So far as the first is concerned, Ez and J2 occur only in 
P(E, Iz), the integral of which over all possible values of its 
arguments must give unity. The entire first integral therefore 
reduces to 


f ab f at Pali Ps arta, 


which is by definition ¢,_,(ZJ), since the subscripts serve no 
other purpose than to keep our variables distinct. 

In a similar fashion, the fourth term reduces immediately 
to (EJ). 

As for the second term, it can be separated into the product 
of the two expressions 


. f a8 f ats Ee P1(Ea, I2) 


f abi {ats Tr; Pei Hien. 


which are obviously the expected value of EF at the end of an_ 
interval in which only one electron was emitted, and the 
expected value of J at the end of an interval in which » — 1 
were emitted. We denote these by «:(Z) and «,_,(J). 


and 


§ 156. THE SCHOTTKY EFFECT 421 


Similarly the third term of the integral leads to e1:(J) and 
G1) j 
Collecting these results we have 


én( EL) = én (ED) +1 (E) €n_(D) +e, -1(E) J) +e1(ED). (250) 
By virtue of assumption (a) it follows at once, however, that 
€,-s) = (n — 1) (2), 
ert) = (2 — 1) ald); 
whence (250) becomes 
en(EI) = ¢,_ (EI) + e:(EI) 4 2(” — 1) (EZ) a(Z); 


from which, by setting 7 equal to 2, 3, 4,... in succession, we 
easily derive the general formula 


e(El) = ne(El) + n(n —13)a(B)a(Z). (251) 


The computation of e(ZJ) is now possible. We replace 
the integral terms of (249) by the expression (251) to which 
we have succeeded in reducing them and evaluate the summa- 
tion, thus arriving at the final result 


(El) = OT) al(EL) + 0T)2 (EF) al). (252) 


This equation actually contains the solution of our problem. 
All that remains is to interpret it. This we can easily do by 
noting that the instant at which our interval ends is itself of 
the nature of a random observation upon the system. Sup- 
pose, howéver, that we have any function of the time /(¢), 
which extends over an interval of length T. If we choose a 
time at random and observe this function, we are as likely to 
land in any element df as any other. Hence the expectation 
of the value of f shown by our observation is 


Senne > if “fl dt, 


which is just the height of a rectangle the area of which is 
the same as the area under /(¢). In other words, e(/) is the 


422 PROBABILITY AND ITS ENGINEERING USES 


value which f(¢) would have if all its fluctuations were smoothed 
out. . 
In the case of the functions EJ, E and J in (252), where T 
has been chosen much longer than the physical duration of the 
pulse due to an electron, the integral from o to T includes 
the entire area under the curve in question. In other words, 
T «:(EJ) is the total energy which would be dissipated by a 
single electron if no other were ever emitted, which we may 
call w;; while v7 :(Z) and yTe:(J) areé-the values which the 
current and voltage due to »T electrons would have, if all the 
fluctuations were smoothed.out. Their product, then, is the 
constant power that would exist under these circumstances. 
We denote it by Mo, thus reaching the result 


(ED ene 


According to the definition contained in (d), the difference 
between this expected power and //9 is just the magnitude of 
the Schottky effect for which we are seeking. Thus we have, 


= pw. 
Stated in words this simple expression says, 


If the receiving circuit is of such a nature that the emission of 
a single electron would cause the dissipation of w, units of energy 
in it in the absence of all other electrons, and if electrons are 
being emitted at the rate of v per unit time, the power in the circuit 
exceeds that which would be produced by the emission of a per- 
fectly steady stream of electricity by the amount vw.. 


This is a remarkably simple result for such a complicated 
piece of reasoning; and incidentally one which fits in well with 
the experimental requirements of the physicist, for in order 
to determine the quantity w1 he need only subject his measuring 
circuit to a shock such as that which an electron would give it, 
but on a larger scale, and measure the total amount of energy . 
dissipated asa result. Reduced to the scale of a single electron, 
and multiplied by the rate of emission », it gives him at once 
the magnitude of the Schottky effect. 


§ 156. THE SCHOTTKY EFFECT 423 


REFERENCES FOR OutsIDE READING 
I. The Kinetic Theory’ ‘ 
1. Jeans: The Dynamical Theory of Gases. 
2. Watson: 4 Treatise on the Kinetic Theory of Gases. 
3. Botrzman: Vorlesungen uber Gastheorie. 
Il. Fluctuation Phenomena in General: 
4. Furru: Schwankungserscheinungen in der Physik. 
IIl. The Schottky Effect: 


5. Scnorrxy: Uber spontane Stromschwankungen in ver- 
schiedenen Elektrizitatsleitern, Annalen der Physik, 
Vol. 57 (1918), pp. 541-567. 

6. Furtu: Die Bestimmung der Elektronenladung aus dem 
Schroteffekt an Gluhkathodenrohren, Physikalische Zeit- 
schrift, Vol. 23 (1922), pp. 354-362. 


7. Fry: The Theory of the Schroteffert, Journal of the Franklin 
Institute, Vol. 199 (1925), pp. 203-220. 


= 
2 
> a 
ma 
a 
as = - 
SEF 
a E: 
iS , ——7 
_—_ — 
- 7 
Le 


re 


APPENDICES 


I. THE FACTORIALS OF INTEGERS 427 
nee ee ee 
n n! n n! n n! 
Pain 2 36 | 3-779 9333 41 71 | 8.504 7859 101 
ee S7o it shay 3758, 42 72 | 6.123 4458 108 
3 6 38 5.230 2262 44 Marl aeawo gree lS 
ea (aes? 39 | 2.039 7882 46 Pap 3-307 8854107 
5 1.20? 40 |) 8.159 1528 *? "ea 2.480 gt4i oP? 
Goi) Spsee* 4i oh Br 3as acai 48 76 | 1.885 4947111 
G 5.040% 42 1.405 oo6r 5! wal Base 830g 42% 
8 4.032.04 43 6.041 5263 52 78 haga qanertts 
9 3.628 80° 44 2.658 2716 54 "9 | “B.946. 1821 116 
10 3-628 8008 =|} gg 1.196 2222 6 80. 7a156 9457 42% 
II 3.991 68007 46 5.502 6222 57 81 5 797 1260720 
12 4.790 0160 8 47 2.586 2324 9 $2.) 4.753 6497 2"" 
13 6.227 0208 9 48 1.241 3916 ©! 83. | 3:945 5240 
14 8.717 Sagi 1° 49 6.082 8186 8 84 | 3.314 240n ee 
15 r2307 6744.27 s0 | 3.041 4093 % Se |) 9.817 140 t 22 
16 | 2.092 2790 18 51 Paper qrs3 68 86° | 2.422 Joos 190 
a7 556 8743 * 52 | 8.065 8175 67 87 S87 S73 
18 6.402 3737 15 53 4.274 8833 69 88 1.854 8264 13 
19 1.216 451017 54 | 2.308 4370 74 89 | 1.650 7955 +8 
20 2.432 go20 18 55 | 1.269 6403 73 go | 1.485 7160 1% 
21 5.109 0942 19 56 | 7.109 9859 “4 BL ct 3.4Ke coms ss 
22 1.124 0007 2! 57 4.062 6920 78 92 1.243 8414 + 
23 a. 585 208777 58 |--2.340 5613 78 93 | 1.156 7725 aa 
24 6.204 4840 23 59 1.386 8312 8° 94 | 1.087 3662 oe 
25 I.§51 121075 60 | 8.320 9871 3! 95 | 1.032 9978 
26 4-032 9146 76 61 5,075 8021 83 96 9.916 7793 ie 
27 1.088 8869 28 62 | 3.146 9973 % 97 | 9.619 2760 
28 3.048 8834 29 63 1.982 6083 8” 98 | 9.426 8904 oe 
29 8.841 7620 39 64 | 1.268 8693 89 99 | 9-332 6215 oe 
30 2.652 5286 32 65 8.247 6506 9 To! ic 9.332 Garg 49. 
35; | s8ie22 8387 °° 65 | 5.443 4494 e ror | 9.425 9478 i 
20) 2.631 3084 3° 67 3-647 I1It 102 9.614 4667 ie 
33 8.683 3176 3% 68 2.480 0355 si Og (ge) spony 
34 2.952 3280 38 69 tefl, 2245 a 104 1.029 ae A 
35 1.033 3148 = 70 1.197 8572 105 1.081 39 


428 I. THE FACTORIALS OF INTEGERS 
n n\ n n\ n n\ 
106 1.146 2806 170 141 1.898 1438 243 176 | 1.979 0311 320 
107 1.226 5202 172 142 2.695 3641 74° 177° | 3.502 8851 322 
108 |, -1.32q-6438 174 |] 4930 | 1g. 854 g7077=" I 778 WN Geghingae 
109 1.443 8596 176 144 5.550 2938 749 179 | 1.116 0892 327 
110 THG88. agegit’ > 146 8.047 9261 751 180 | 2.008 9606 329 
III 1.762 9526 189 || 146 1.174 9972 254 181 | 3.636 2187 334 
112 1.974 5069 182 147 1.727 2459 258 182) 6.617 918 =o" 
113 2.231 1927 184 148 2.566 3239 798 183 1.211 0790 336 
Tig te 2ig43 S597 7°" 149 | 3.808 9226 260 184 | 2.228 3854 338 
ug | 12.926 0937288. I) “aso. | e703 3840 2°? Tes ay a aoeriaoe a 
116 | 3.393 1087 19° 161 8.627 2098 264 186 | 7.667 8741 342 
117 10.3969 99721? || “52. eugud 33592°! Nl See a mh tanga Boa eee 
118 4.684 5258 194 153 2.006 3439 769 188 | 2.695 7178 347 
119 | 5.574 5858198 || 154 | 3.089 769627! || 189 | 5.094 9067 349 
120 | 6.689 5029198 |) 155 | 4.789 1429 7/3 190 | 9.680 3227 99! 
121 8.094 2985 200 156 7.471 0629275 1gI 1.848 9416 354 
Boni 9,875 0447°°? Nt ae7e | waa gsogn"® || Fema aikay como 
123 1.214 6304 705 158 Vie Sse ey heya 193 6.85 43ouse. 
124 1.506 1417 207 159 2.946 7023 78 194 | 1.329 1790 361 
125 1,882 6772799! t60 |. 4.714 7236794 || “aos | “ausor Sgqq9°? 
126 | 2.372 1732 711 161 7,590 yout 288 || 1096 || £080 ¥2er 9 
127 3.012 6600 713 162 1.229 6942 289 197 1.000 7841 368 
128 3.856 2048 215 163 2.004 4016 291 198 1.981 5524370 
129 | 4.974 5042717 || 164 | 3.287 2186293 || 199 | 3.943 289397? 
130°/)| 6.466. Seog 229 165 5.423 9107795 || 200 | 7.886 5787374 
131 8.471 5807 221 166 | 9.003 6917 297 
132 1.118 2487724 | 167 1.503 6165 900 
133 | 1.487 2707726 || 168 | 2.526 o757 302 
134 1.992 9427728 || 169 | .4.269 0680 34 
135.)1|, -2-690;4727 700 I! t976: |. quagyognnGio¥e 
136 | - 3x659 0429 722 171 1.241 o18r 309 
137 |) §$sO1 8887.29" 172.) soem gaccgy Sh 
138%), 6.917 786572 ° “ll 173 ly boa Gaga 
139. | 9:615. 7232788 || 174 | 6.425 gan 218 
140 |) 18346 2012.2"! |) Tae |. 7194 age StS 


Hadve (-Pl= Vm @4+)!=@4+P@—pa—8...484 


Te 


2 


II. THE LOGARITHMS OF FACTORIALS 


log n! 


log n! 


429 


log n! 


Lal 


OO ont DD APwWHH 


©.000 C00 0000 
CaS eR O29 995i 
* 0.778 151 2504 
TEQOO 2h t-O407 
2.079 181 2460 


2.857 332 4964 
3-702 430 5364 
4.605 520 5234 
5.559 763 0329 
6.559 763 0329 


7.601 155 7180 
8.680 336 9641 
9-794 280 3164 
0.940 408 3521 
2.116 499. 6111 


13.320 619 5938 
14.551 068 5152 
1§.806 341 0203 
17.085 094 6212 
18.386 124 6169 


19.708 343 9116 
21.050 766 5924 
22.412 494 4285 
23-792 705 6702 
25.190 645 6788 


26.605 619 0268 
28.036 92 7910 
29.484 140 8223 
30.946 538 8202 
32.423 660 0749 


33-915 021 7688 


35-420 171 7471 


36.938 685 6870 
38.470 164 6040 
40.014 232 6484 


41.570 535 1491 
43-138 736 8732 
44.718 520 4698 
46.309 585 0768 
47-911 645 0682 


49.524 428 9249 
Ei.a7, 678, 2053 
52.781 146 6709 
54-424 599 3473 
56.077 811 8611 


57.749 569 6928 
59.412 667 5507 
61.093 908 7881 
62.784 104 8681 
64.483 074 8725 


66.190 645-0486 
67.906 648 3922 
69.630 924 2618 
71.363 318 0216 
73.103 680 7111 


74.851 868 7381 
76.607 743 5938 
78.371 171 5874 
80.142 023 5990 
81.920 174 8494 


83.705 504 6844 
85.497 896 3739 
87.297 236 9234 
89.103 416 8y73 
90.916 330 2540 


92.735 874 1895 
94.561 948 9922 
96.394 457 9049 
98.233 306 9957 
100.078 405 0357 


101.929 663 3844 
103.786 995 8808 
105.650 318 7410 
107.519 550 4607 
109.394 611 7241 


IIIl.275 425 3164 
113.161 g16 0415 
115.054 O10 6442 
116.951 637 7355 
118.854 727 7225 


120.763 212 7414 
122.677 026 5938 
124.596 104 6861 
126.520 383 9722 
128.449 802 8979 


130.384 301 3492 
132.323 820 6018 
134.268 303 2739 
136.217 693 2806 
ESE ETT 1935 7990 


140.130 977 1823 
142.094 765 0097 
144.063 247 9582 
146.036 375 8118 
148.014 099 4171 


149.996 370 6502 
151.983 142 3844 
153.974 368 4601 
155.970 003 6547 
157-972 003 6547 


159.974 325 0285 
161.982 925 2003 
163.995 762 4250 
166.012 795 7643 
168.033 985 0633 


170.059 290 9286 
172.088 674 7063 
174.122 098 4618 
176.159 $24 9597 
178.200 917 6449 


180.246 240 6237 
182.295 458 6463 
184.348 537 0898 
186.405 441 9411 
188.466 139 7815 


ROO S32 597 Tie7 
192.598 783 6325 
194.670 665 6398 
196.746 212 6012 
198.825 393 8472 


Taken from the 18-place table of C. F. Degen, Hayniae, 1824, 


II. THE LOGARITHMS OF FACTORIALS 


430 

ee 
n log x! n log n! n log n! 
121 | 200.908 179 2175 161 | 286.880 282 1167 201 | 377.200 084 6975 
122 | 202.994 539 0482 162 | 289.089 797 1313 202 | 379.505 436 0669 
123 | 205.084 444 1597 163 | 291.301 984 7357 203 | 381.812 932 1048 
124 | 207.177 865 8448 164 | 293.516 828 5837 904 | 384.122 562 2702 
125 | 209.274 775 8578 || 165 | 295.734 312 5279 || 205 | 386.434 316 1333 
126 | 211.375 146 4029 166 | 297.954 420 6160 206 | 388.748 183 3537 
127 | 213.478 950 1239 167 | 300.177 137 O871 207 | 391.064 153 6991 
128 | 215.586 160 0935 168 | 302.402 446 3688 208 | 393.382 217 0341 
129 | 217.696 749 8038 169 | 304.630 333 0735 209 | 395.702 363 3202 
130 | 219.810 693 1561 170 | 306.860 781 9948 210 | 398.024 582 6149 
131 } 221.927 964 4518 171 | 309.093 778 1052 211 | 400.348 865 0702 
132 | 224.048 538 3830 L72) \esUingzon foo 4 521 212 | 402.675 200 9312 
133 | 226.172 390 0240 || 173 | 313-567 352 6553 || 213 | 405.003 580 5346 
134 | 228.299 494 8223 || 174 | 315.807 901 9035 || 214 | 407-333 994 3080 
135 | 230.429 828 5908 175 | 318.050 939 9522 215 | 409.666 432 7679 
136 | 232.563 367 4992 176 | 320.296.452 6200 216 | 412.000 886 5190 
137 | 234.700 088 0664 177 | 322.544 425 8864 217 | 414.337 346 2529 
138 | 236.839 967 1528 178 | 324.794 845 8887 218 | 416.675 802 7465 
139 | 238.982 981 9530 179 | 327.047 698 9197 219 | 419.016 246 8613 
140 | 241.129 109 9887 180 | 329.302 971 4248 220 | 421.358 669 5421 
14I | 243.278 329 1014 181 | 331.560 649 9997 221 | 423.703 061 8158 
142 | 245.430 617 4457 182 | 333.820 721 3876 222 | 426.049 414 7903 
143 | 247-585 953 4832 || 183 | 336.083 172 4774 || 223 | 428.397 719 6533 
144 | 249.744 315 9753 || 184 | 338.347 990 3004 || 224 | 430.747-967 6717 
145 | 251.905 683 9775 185 | 340.615 162 0288 225 | 433.100 150 1898 
146 | 254.070 036 8333 186 | 342.884 674 9730 226 | 435.454 258 6289 
147 | 256.237 354 1681 187 | 345.156 516 5795 227 | 437.810 284 4861 
148 | 258.407 615 8835 188 | 347.430 674 4288 228 | 440.168 219 3331 
149 | 260.580 802 1519 189 | 349.707 136 2330 229 | 442.528 064 8154 
150 | 262.756 893 4109 190 | 351.985 889 8339 || 230 | 444.889 782 6515 
151 | 264.935 870 3582 I9I | 354.266 923 2012 231 | 447.253 394 6314 
152 | 267.117 713 9462 192 | 356.550 224 4299 232 | 449.618 882 6162 
153 | 269.302 405 3770 |] 193 | 358.835 781 7389 || 233 | 451.986 238 5373 
154 | 271.489 926 0978 || 194 | 361.123 583 4688 || 234 | 454.355 454 3947 
155 | 273.680 257 7960 195 | 363.413 618 0802 || 235 | 456.726 522 25%70 
156 | 275.873 382 3943 || 196 | 365.705 874 1515 || 236 | 459.099 434 2599 
157 | 278.069 282 0468 || 197 | 368.000 340 3777 237 | 461.474 182 6059 
158 | 280.267 939 1337 || “198 | 370.297 005 5680 238 | 463.850 759 5630 
159 | 282.469 336 2580 199 | 372.595 858 6444 239 | 466.229 157 4639 
160 284.673 456 2407 200 | 374.896 888 6400 240 | 468.609 368 7056 


a Ee ee 


I 


I. 


THE LOGARITHMS OF FACTORIALS 


431 


Sn a a ee 


n log n! n log n! n log n! 

241 | 470.991 385 7482 281 | 567.673 298 3669 321 | 666.832 041 9066 
242 | 473-375 201 1142 || 282 | 570.123 547 475% || 322 | 669.339 897 7783 
243 | 475.760 807 3878 283 | 572.575 333 9108 323 | 671.849 100 3006 
244 | 478.148 197 2141 284 | 575.028 652 2508 324 | 674.359 645 3108 
245 | 480.537 363 2985 285 | 577.483 497 1108 325 | 676.871 528 6718 
246 | 482.928 298 4056 286 | 579.939 863 1439 326 | 679.384 746 2718 
247 | 485.320 995 3589 || 287 | 582.397 745 0407 || 327 | 681.899 294 0245 
248 | 487.715 447 0397 || 288 | 584.857 137 5284 || 328 | 684.415 167 8682 
249 | 490.111 646 3868 289 | 587.318 035 3712 || 329 | 686.932 363 7662 
250 | 492.509 586 3955 || 290 | 589.780 433 3691 || 330 | 689.450 877 7060 
251 | 494.909 260 1169 291 | $92.244 326 3581 331 | 691.970 705 6998 
252 | 497.310 660 6577 || 292 | 594.709 709 2095 || 332 | 694.491 843 7835 
253 | 499-713 781 1789 || 293 | 597-176 576 8299 || 333 | 697.014 288 o170 
254 | 502.118 614 8955 || 294 | 599.644 924 1603 || 334 | 699.538 034 4838 
255 | 504.525 155 0760 295 | 602.114 746 1763 355 | 702.063 079 2909 
256 | 506.933 395 0413 || 296 | 604.586 037 8873 || 336 | 704.589 418 5683 
257 | 509.343 328 1646 || 297 | 607.058 794 3367 || 337 | 797-117 048 4691 
258 | 511.754 947 8706 || 298 | 609.533 o10 6007 |} 338 | 709.645 965 1694 
259 | 514.168 247 6346 299 | 612.008 681 7891 339 | 712.176 164 8676 
260 | 516.583 220 9826 300 | 614.485 803 0438 340 | 714.707 643 7847 
261 | 518.999 861 4900 || 301 | 616.964 369 5394 || 341 | 717.240 398 1636 
262 | 521.418 162 7813 302 | 619.444 376 4823 342 | 719.774 424 2697 
263 | 523.838 118 5298 303 | 621.925 819 1108 343 | 722.309 718 3897 
264 | 526.259 722 4566 304 | 624.408 692 6944 344 | 724.846 276 8323 
265 | 528.682 968 3306 || 305 | 626.892 992 5338 345 | 727.384 095 9274 
266 | 531.107 849 9672 || 306 |. 629.378 713 9603 || 346 | 729.923 172 0262 
267 | 533-534 361 2286 || 307 | 631.865 852 3357 || 347 | 732-463 Sor Soro 
268 | 535.962 496 0226 || 308 | 634.354 403 0522 || 348 | 735.005 080 7449 
269 | 538.392 248 3026 || 309 | 636.844 361 5317 || 349 | 737-547 906 1719 
270 | 540.823 612 0668 310 | 639.335 723 2255 350 | 740.091 974 2162 
271 | 543.256 581 3576 || 311 | 641.828 483 6145 || 351 | 742.637 281 3327 
272 | 545.691 150 2617 312 | 644.322 638 2085 352 | 745.183 823 9962 
273 | 548.127 312 9087 313 | 646.818 182 5461 353 | 747-731 598 7016 
274 | 550-565 063 4715 || 314 | 649.315 112 1942 || 354 | 750.280 6or 9636 
275 | 553.004 396 1654 || 315 | 651.813 422 7480 |} 355 | 752.830 830 3166 
276 | 555.445 305 2474 || 316 | 654.313 109 8306 || 356 | 755.382 280 3146 
277 | 557.887 785 0165 || 317 | 656.814 169 0928 || 357 | 757-934 948 5307 
278 | 560.331 829 8124 || 318 | 659.316 596 2128 358 | 760.488 831 5574 
279 | 562.777 434 0157 || 319 | 661.820 386 8958 || 359 | 763.043 926 0060 
280 | 565.224 592 0470 || 320 | 664.325 536 8742 || 360 765.600 228 5067 


eee _———o—ewr 


Il. THE LOGARITHMS OF FACTORIALS 


i 


n 


37 
372 
373 
374 
375 


376 
Lyi) 
378 
379 
380 


381 
382 
383 
384 
385 


386 
387 
388 
389 
Soe 


Shs 
SOE 
393 
394 
395 


396 
397 
398 
399 
400 


log n! n log n! n log n! 
768.157 735 7086 || 401 | 871.409 558 5503 || 441 | 976.394 942 
770.716 444 2792 || 402 | 874.013 784 6034 || 442 | 979.040 365 
773.276 350 9042 || 403 | 876.619 089 6496 || 443 | 981.686 768 
775-837 452 2878 || 404 | 879.225 471 0147 || 444 | 984.334 IST 
778.399 745 1523 || 405 | 881.832 926 0379 || 445 | 986.982 511 
780.963 226 2377 406 | 884.441 452 O715 446 | 989.631 846 
783.527 892 3019 407 | 887.051 046 4807 447 | 992.282 154 
786.093 740 1206 |] 408 | 889.661 706 6438 448 | 994-933 432 
788.660 766 4868 409 | 892.273 429 9518 449 | 997.585 678 
791.228 968 2108 410 | 894.886 213 8085 450 | 1000.238 8g0 
793-798 342 1205 || 411 | 897.500 055 6304 || 451 | 1002.893 067 
796.368 885 0603 |} 412 | goo.114 952 8464 || 452 | 1005.548 205 
798.940 593 8922 413 | 902.730 go2 8981 453°| 1008.204 304 
801.513 465 4944 || 414 | 905.347 903 2392 || 454 | Io10.861 359 
804.087 496 7621 || 415 | 907.965 951 3359 || 455 | 1013.519 371 
806.662 684 6070 416 | 910.585 044 6665 456 | 1016.178 336 
809.239 025 9572 417 | 913-205-180 7215 457 | 1018.838 252 
811.816 517 7571 418 | 915.826 357 0033 458 | 102.499 117 
814.395 156 9670 419 | 918.448 571 0263 459 | 1024.160 930 
816.974 940 5636 || 420 | 921.071 820 3167 || 460 | 1026.823 688 
819.555 865 5393 || 421 | 923.696 102 4125 || 461 | 1029.487 389 
822.137 928 go22 || 422 | 926.321 414 8635 462 | 1032.152 O31 
824.721 127 6762 || 423 | 928.947 755 2308 || 463 | 1034.817 612 
827.305 458 go06 |) 424 | 931.575 121 0874 || 464 | 1037.484 130 
829.890 919 6301 425 | 934.203 510 O175 465 | 1040.151-583 
832.477 506 9347 426 | 936.832 919 6166 466 | 1042.819 969 
835.065 217 8998 || 427 | 939.463 347 4916 || 467 | 1045. 489 286 
837.654 049 6254 || 428 | 942.1094 791 2606 || 468 | 1048.159 531 
840.243 999 2267 || 429 | 944.727 248 5528 || 469 | 1050.830 704 
842.835 063 8337 430 | 947.360 717 0084 470 | 1053.502 802 
845.427 240 SgIt || 431 | 949.995 194 2785 || 471 | 1056.175 823 
848.020 526 6581 432 | 952.630 678 0254 472 | 1058.849 765 
850.614 919 2085 433 | 955.267 165 9217 473 | 1061.524 626 
853.210 415 4303 || 434 | 957.904 655 6512 |) 474 | 1064.200 404 
855.807 012 §259.|/ 435 | 960.543 144 9082 || 475 | 1066.877 098 
858.404 707 7119 || 436 | 963.182 631 3974 || 476 | 1069.554 705 
861.003 498 2186 || 437 | 965.823 112 8344 477 | 1072.233 222 
863.603 381 2907 || 438 | 968.464 586 9449 478 | 1074.912 651 
866.204 354 1864 || 439 | 971.107 051 4652 || 479 | 1077.592 987 
868.806 414 1777 440 | 973.750 504 1416 480 | 1080.274 228 


7311 
0005 
7267 
6968 
7078 


5665 
0896 
1036 
4446 
9584 


5083 
93a" 
1371 
ggoo 
3866 


2293 
4293 
9°74 
5929 
4246 


3500 
3255 
3165 
2971 
2500 


1667 
0472 
9903 
7432 
6010 


5081 
5067 
6475 
9891 
5988 


5515 
9305 
8271 
3405 
5779 


Il. THE LOGARITHMS OF FACTORIALS 


433 


_—_—__—e—e—e————————— ee SSESSSSSFSFSSSSSSSssFsFeheseF 


log n! 


log n! 


log n! 


1082.956 373 6543 
1085 .639 420 6925 
1088 . 323 367 8233 
IogI.008 213 1849 


1093.693 954 9235 


1096.380 S91 1928 
1099.068 120 1540 
I101.756 539 9760 
1104.445 848 8351 
II07.136 044 9152 


1109.827 126 4073 
III2.$1g OgI SIOI 
TII§.211 938 4293 
III7.905 665 3783 
1120.600 270 5772 


£223-295 752 2537 
II25.992 108 6424 
1128 .689 337 9852 
1131. 387 438 5308 
1134.086 408 5351 


1136.786 246 2610 
1139-486 949 9781 
T142.188 517 9632 
1144.890 948 4996 
1147.594 239 8778 


I150.298 390 3946 
1153.003 398 3539 
1155.709 262 0662 
1158.415 979 8486 
T161.123 550 0247 


1163.831 970 9248 
1166.541 240 8858 
1169.251 358 2509 
I171.962 321 3699 
1174.674 128 5989 


1177.386 778 3005 
1180,100 268 8436 
1182.814 598 6034 
1185.529 765 9612 
1188.245 769 3049 


1190.962 607 0282 
1193.680 277 5312 
1196.398 779 2200 
I1g99.118 110 5070 
1201 .838 269 8104 


1204.559 255 5546 


1207.281 066 1698 


1210.003 700 0923 
Wp Ib) phy Gals ley. Ma 
1215.45 431 6340 


1218.176 526 1550 
1220.902 437 7873 
1223.629 164 9964 
1226.356 706 2534 
1229.085 060 0354 


1231.814 224 8251 
1234.544 Ig9 1108 
1237.274 981 3865 
1240.006 570 1517 
1242.738 963 9115 


1245.472 161 1766 
1248.206 160 4631 
1250.940 960 2927 
1253.676 559 1924 
1256.412 955 6947 


1259.150 148 3374 
1261.888 135 6637 
1264.626 gi6 2222 
1267.366 488 5667 
1270.106 851 2562 


1272.848 002 8550 
1275. 589 941 9327 
1278 .332 667 0640 
1281.076 176 8288 
1283.820 469 8119 


1286.565 544 6035 
1289 311 399 7987 
1292.058 033 9976 
1294.805 445 8055 
1297-553 633 8325 


S71 
Sure 
573 
574 
oa) 


576 
577 
578 
579 
580 


581 
582 
583 
584 
585 


586 
587 
588 
589 
59° 


390 
oe 
8 
594 
595 


596 
597 
598 
599 
600 


1300. 302 596 6937 
Bess h 2 333.2003 
1305.802 841 4042 
1308.554 120 S081 
1311.306 168 9560 


1314.058 985 3871 
1316.812 568 4460 
1319.566 916 7818 
1322.322 029 0481 
1325-077 903 9038 


1327.834 540 O121 
1330. 591 936 O409 
1333-350 O90 6628 
1336. 109 002 5552 
1338 .868 670 3999 


1341 .629 092 8833 
1344.390 268 6965 
1347-152 196 5349 
1349.914 875 0986 
1352.678 303 0922 


1355-442 479 2246 
1358.207 402 2092 
1360.973 070 7640 
1363.739 483 6111 
1366. 506 639 4772 


1369.274 537 0932 
13/2.043 375, 1945 
1374852 $52 §205 
1377-582 667 8153 
1380.353 519 8270 


1383.125 107 3079 
1385.897 429 0146 
1388.670 483 7079 
1391.444 270 1529 
1394.218 787 1186 


1396 .994 033 3784 
1399-779 007 7°95 
1402.546 708 8935 
1405 .324 135 7159 
1408.102 286 9663 


434 

n log n! n log n! n log n! 

6o1 | 1410.881 161 4383]/ 641 | 1522.615 809 4311|| 681 | 1635.434 357 6708 
602 | 1413.660 757 9295|| 642 | 1525.423 344 4591|| 682 | 1638.268 142 0455 
603 | 1416.441 075 2417|| 643 | 1528.231 555 4320] 683 | 1641.102 562 7492 
604 | 1419.222 112 1803]] 644 | 1531.040 441 2994]| 684 | 1643.937 618 8509 
605 | 1422.003 867 5550|| 645 | 1533.850 COI O140)| 685 | 1646.773 309 4224 
606 | 1424.786 340 1791|| 646 | 1536.660 233 5320]| 686 | 1649.609 633 5381 
607 | 1427.569 528 8702|| 647 | 1539.471 137 8127 687 | 1652.446 S90 2752 
608 | 1430.353 432 4495|| 648 | 1542.282 712 8186)| 688 | 1655.284 178 7134 
609 | 1433-138 049 7421] 649 | 1545-094 957 5154]/ 689 | 1658.122 397 9353 
610 | 1435.923 379 5771|| 650 | 1547.907 870 8720]| 690 | 1660.961 247 0260 
611 |. 1438.709 420 7874|| 651 | 1550.721 451 8606|| 691 | 1663.800 725 0734 
612 | 1441.496 172 2095|| 652 | 1553.535 699 4563]| 692 | 1666.640 831 1679 
613 | 1444-283 632 6840|] 653 | 1556.350 612 6376|| 693 | 1669.481 564 4025 
614 | 1447.071 Bor 0552/| 654 | 1559.166 190 3859|| 694 | 1672.322 923 8729 
615 | 1449.860 676 1709|| 655 | 1561.982 431 6859]/ 695 | 1675.164 908 6775 
616 | 1452.650 256 8831|| 656 | 1564.799 335 5253|| 696 | 1678.007 517 9171 
617 | 1455.440 $42-0471|| 657 | 1567.616 goo. 8948|| 697 | 1680.850 750 6952 
618 | 1458.231 530 §222|| 658 | 1570.435 126 7885|/ 698 | 1683.694 606 1179 
619 | 1461.023 221 1712|| 659 | 1573.254 O12 2031|/ 699 | 1686.539 083 2936 
620 | 1463.815 612 8607|| 660 | 1576.073 556 1386|) 700 | 1689.384 181 3336 
621 | 1466.608 704 4609|| 661 | 1578.893 757 §31|| Jor | 1692.229 899 3516 
622 | 1469.402 494 8456)| 662 | 1581.714 615 §875|/ 702 | 1695.076 236 4637 
623 | 1472.196 982 8923|| 663 | 1584.536 129 1159]/ 703 | 1697.923 191 7887 
624 | 1474.992 167 4819|| 664 | 1587.358 297 1953] 704 | 1700.770 764 4479 
625 | 1477-788 047 4993|| 665 | 1590.181 118 8406] 705 | 1703.618°953 5649 
626 | 1480.584 621 8325|| 666 | 1593.004 593 0698|/ 706 | 1706.467 758 2659 
627 | 1483.381 889 3733|/ 667 | 1595.828 718 9037|| 707 | 1709.317 177 6797 
628 | 1486.179 849 o171|| 668 | 1598-653 495 3662|/ 708 | 1712.167 210 9374 
629 | 1488.978 499 6625]| 669 | 1601.478 921 4839|/ 709 | 1715.017 857 1726 
630 | 1491.777 840 2120|| 670 | 1604.304 996 2866]) 710 | 1717.869 115 5213 
631 | 1494.577 869 5712|| 671 | 1607.131 718 8068|| 711 | 1720.720 985 1220 
632 | 1497.378 586 6495)| 672 | 1609.959 088 0798|| 712 | 1723.573 465 1157 
633 | 1$00.179 990 3595|| 673 | 1612.787 103 1441]| 713 | 1726.426 554 6455 
634 | 1502.982 079 6174|| 674 | 1615.615 763 0406]/ 714 | 1729.280 252 8573 
635 | 1505-784 853 3427|| 675 | 1618.445 066 8134|| 715 | 1732.134 558 8991 
636,| 1508.588 310 4583]/ 676 | 1621.275 013 Sog4)| 716 | 1734.989 471 9214 
637 | 1511.392 449 8907|| 677 | 1624.105 602 1781|| 717 | 1737-844 991 0771 
638 | 1514.197 270 5694)| 0678 | 1626.936 831 8719]| 718 | 1740.701 116 5213 
639 | 1517.002 771 4275|| 679 | 1629.768 Jor 6462|] 719 | 1743.557 844 4117” 
640 | 1519-808 951 4015|| 680 | 1632.601 210 §589|| 720 | 1746.415 176 go81 


II. THE LOGARITHMS OF FACTORIALS 


ee 


II. THE LOGARITHMS OF FACTORIALS 


435 


log n! n log n! n log n! 
1749-273 112 1728] 761 | 1864.075 452 8940|| 801 | 1979.790 716 7537 
1752.131 649 3704|| 762 | 1866.957 407 8654|| 802 | 1982.694 891 1220 
1754-990 787 6677|| 763 | 1869.839 932 4033|| 803 | 1985.599 606 6673 
1757.850 526 2339|| 764 | 1872.723 025 761g|| 804 | 1988.504 862 7160 
1760.710 864 2405|) 765 | 1875.606 687 1971|| 805 | 1991.410 658 5964 
1763.571 800 8612|| 766 | 1878.490 915 9667|| 806 | 1994.316 993 6382 
1766.433 335 2720|| 767 | 1881.375 711 3306|| 807 | 1997.223 867 1729 
769.295 466 6514|| 768 | 1884.261 072 5507|| 808 | 2000.131 278 5337 
1772.158 194 1797|| 769 | 1887.146 998 8905]| 809 | 2003.039 227 0553 
1775.021 $17 0398]| 770 | 1890.033 489 6156]| 810 | 2005.947 712 0742 
1777-885 434 4167|| 771 | 1892.920 543 9937|| 811 | 2008.856 732 9284 
1780.749 945 4978|| 772 | 1895.808 161 2940]| 812 | 2011.766 288 9576 
1783-615 049 4724|| 733 | 1898.696 340 7879|| 813 | 2014.676 379 5032 
1786.480 745 5324|| 774 | 1901.585 O81 7486]| 814 | 2017.587 003 go81 
1789-347 032 8714|| 775 | 1904.474 383 4511|| 815 | 2020.498 161 5169 
1792.213 910 6858|| 776 | 1907.364 245 1724]| 816 | 2023.409 851 6756 
1795.081 378 1736], 777 | 1910.254 666 1912]| 817 | 2026.322 073 7322 
1797-949 434 5355|| 778 | 1913-145 645 7882|| 818 | 2029.234 827 0358 
1800.818 078 9739|| 779 | 1916.037 183 2459]| 819 | 2032.148 110 9376 
1803.687 310 6936); 780 | 1918.929 277 8485]| 820 | 2035.061 924 7900 
1806.557 128 go16|| 781 | 1921.821 928 8824|| 821 | 2037.976 267 9471 
1809.427 532 8069|| 782 | 1924.715 135 6355]| 822 | 2040.891 139 7646 
1812.298 521 6206|/ 783 | 1927.608 897 3975|| 823 | 2043.806 539 5998 
1815.170 094 5562|| 784 | 1930.503 213 4602|/ 824 | 2046.722 466 8115 
1818.042 250 8289] 785 | 1933-398 083 I170|| 825 | 2049.638 920 7601 
1820.914 989 6564|| 786 | 1936.293 505 6630|) 826 | 2052.555 goo 8074 
*1823.788 310 -2582|| 787 | 1939.189 480 3954)| 827 | 2055.473 406 3170 
1826.662 211 8561|| 788 1942.086 006 6129]| 828 | 2058.391 436 6537 
1829.536 693 6738|| 789 | 1944.983 083 6161/| 829 | 2061.309 ggi 1843 
1832. 411 754 9372|| 79° | 1947.880 710 7074]|| 830 | 2064.229 069 2767 
1835.287 394 8742)| 791 | 1950.778 887 Igog|| 831 2067.148 670 3005 
1838.163 612 7147]| 792 | 1953.677 612 3724|| 832 | 2070.068 793 6267 
1841.040 407 6909]| 793 | 1956.576 885 5598|| 833 | 2072.989 438 6282 
1843-917 779 0368|| 794 | 1959-476 706 0622|| 834 | 2075.910 604 6788 
1846.795 725 9884|/ 795 | 1962.377 073 1908|| 835 | 2078.832 291 1543 
1849.674 247 7839|| 796 | 1965.277 986 2586|| 836 | 2081.754 497 4317 
1852.553 343 6634] 797 | 1968.179 444 5800] 837 | 2084.677 222 8897 
1855.433 O12 8691|| 798 | 1971.08 447 4713|| 838 | 2087.600 466 9083 
1858.313 254 6450]| 799 | 1973-983 994 2506|| 839 | 2090. 524 228 8692 
1861.194 068 2373]| 800 | 1976.887 084 2376/| 840 | 2093.448 508 1552 


Il. THE LOGARITHMS OF FACTORIALS 


log n! 


845 


850 


851 


855 


860 


861 


866 
867 


875 


880 


2096 . 373 304. 1510 
2099.298 -616 2425 
2102.224 443 8171 
2105.150 786 2638 
2108.077 642 9727 


2III.00§ O13 3358 
2113-932 896 7461 
2116.861 292 5984 
2119.790 200 2886 
2122.719 619 2143 


2125.649 548 7744 
2128.579 988 3692 
2131.510 937 4003 
2134.442 395 2710 
2137.374 361 3857 


2140.306 835 1504 
2143.239 815 9723 
2146.173 303 2602 
2149.107 296 4240 
2152.041 794 8753 


2154.976 798 0267 
2157.912 305 2925 
2160.848 316 0883 
2163.784 829 8307 
2166.721 845 9382 


2169.659 363 8302 
2172.597 382 9277 
2175.535 go2 6529 
2178 .474. 922 4293 
2181.414 441 6819 


2184.354 459 8370 
2187.294 976 3219 
2190.235 990 5656 
2193.177 501 9982 
2196.119 510 0512 


2199.062 014 1574 
2202.005 013 7508 
2204.948 508 2667 
2207.892 497 1418 
2210.836 979 8139 


log n! 


log n! 


2213.781 955 7223 
2.216.727 424 3075 
2219.673 385 O10 
2222.619 837 2760 
2225.666 780 5467 


2228.514 214 2686 
2231.462 137 8885 
2234.410 550 8542 
2237-359 452 6152 
2240,308 842 6219 


2243.258 720 3259 
2.246.209 085 1803 
2249.159 936 6392 
2252.111 274 1580 


2255.063 097 1933 


2.258.015 405 2029 
2260.968197 6460 
2263.921 473 9826 


2266.875 233 6744 
2269.829 476 


2272.784 200 9748 
2275-739 407 $123 
2278 .695 095 2626 
2281.651 263 6931 
2284 .607 g12 2723 


2287.565 040 4700 
2290.522 647 7571 
2293.480 733 6056 
2296 .439 297 4888 
2299.398 338 8811 


2302.357 857 2581 
2305.317 852 0964 
2308.278 322 8740 


, 2311.239 269 0697 


2314.200 690 1638 


2317.162 585 6374 


2320.124 954. 9731 
2323 .087 797 6543 
2326.051 113 1657 


2329.014 900. 9930 


1838] 


2331-979 160 6232 
2334-943 891 5443 
2337-909 093 2453 
2340.874 765 2165 
2343-840 906 9493 


2346.807 517 9360 
2349-774 597 6701 
2352.742 145 6463 
2355.710 161 3603 
2358.678 644 3089 


2361 .647 593 9898 
2364.617 00g 9022 
2367.586 891 5459 
2370.557 238 4222 
2373-528 050 0331 


2376.499 325 8818 
2379-471 065 4727 
2382.443 268 3111 
2385.415 933 9033 
2388. 389 061 7569 


2391.362 651 3803 
2394 ..336 702 2831 
7397-3 Ue ZT Sei Do 
2400.286 185 9702 
2403.261.617 7787 . 


2406.237 508 gI51 
2409.213 858 8941 
2412.190 667 2314 
2415.167 933 4439 
2418.145 657 O4gI 


2421.123 837 5661 
2424.102 474 5145 
2427.081 567 4151 
2430.061 115 7898 
2433-041 11g 1614 


2436.021 577 0537 
2439.002 488 9914 
2441.983 854 5005 
2444.965 673 1077 
2447-947 944 3407 


Il. 


THE LOGARITHMS OF FACTORIALS 


437 


log n! 


log n! 


971 


974 
975 


SH, 


g81 


985 


989 
99° 


ee 
wee 
993 
994 
995 


996 
997 
998 
999 
1000 


2450 


2465 


2471 


2474. 


930 667 7284 
2453. 
2456. 
2459. 
2462. 


913 842 8004 
897 469 0876 
881 546 1215 
866 073 4348 


.851 O50 5612 
2468. 


836 477 0353 


.822 352 3926 


808 676 1697 


2477-795 447 


2480.782 667 
2483-779 333 
2486.758 446 
2489.747 005 
2492.736 00g 


2495-725 459 
2498.715 354 
2501.705 693 
2504.696 475 
2507.687 701 


2510.679 370 
2513.671 482 
2516.664 035 
2519.657 030 
2522.650 467 


2525.644 344 
2528 638 661 
2531 .633 418 
2534.628 614 


2537-624 249 


2540.620 323 
2543-616 834 
2546.613 784 
2549.611 170 
2552.608 993 


2555.607 253 
2558.605 948 
2561 .605 078 
2564.604 644 
2567.604 644 


9239 


1338 
3988 
2390 
1959 
8116 


6293 
1930 
0478 
7396 
8153 


8227 
3105 
8283 
9267 
1572 


0722 
2248 
1694 
4610 
6556 


3101 
9822 
Zhe 
6151 
6959 


0343 
1926 
7339 
2221 
2221 


2570.605 078 
2573-605 946 
2576.607 246 
2579.608 980 
2582.611 146 


2996 
O211 
9542 
6670 
7287 


285.613 744 
2588.616 774. 
2591.620 234 
2594.624 125 
2597.628 447 


7094 
1800 
7121 
8783 
2521 


2600 .633 
2603 .638 


198 4077 
378 9202 
2606 .643 988 3656 
2609.650 026 3206 
2612.656 492 3628 


2615.663 386 0708 
2618 .670 707 0237 
2621.678 454 8017 
2624.686 628 9857 


2627.695 229 1575 


2630.704 254 8996 
2633-713 705 7954 
2636.723 581 42y1 
2639.733 881 3857 
2642.744 605 2511 


2645-755 752 6119 
2648.767 323 0555 


| 2651.779 316 1701 


2654.791 731 5449 
2657.804 568. 7696 


2660. 
2663. 
2666. 
2669. 
2672. 


817 827 4349 
831 507 1322 
845 607 4537 
860 127 9925 
875 068 3422 


2675 .890 428 0977 
2678.906 206 8540 
2681 .922 404 2076 
2684.939 O19 7551 
2687.956 053 0944 


2690 


2693 
2697 
2700 
2793 


2706 
2729 


2719). 
2716. 


log n! 


-973 $03 8239 
“991 371 5429 
.009 655 8513 
.028 356 3500 
-047 472 6404 


.067 004 3250 
.086 951 0066 
107 312 2893 
128 087 7775 


2718.149 277 0765 
2721. 
2724.192 
2727.25 
2730.238 
2733.261 


170 879 7926 
895 5324 
323 9036 
164 5145 
416 9741 


080 
195 
641 
Ss 
843 


285 
309 
333 
358 
383 


8923 
8796 
5473 
5074 
3727 


2736. 
2739. 
2742. 
2745. 
2748. 


7566 
2733 
5378 
1658 
7736 


558 
683 
216 
158 
$97 


2751.409 
2754-435 
2757.462 
2760.489 
2763.516 


264 
429 
000 


978 
362 


2766. 544 
2769 .572 
2772.601 
2775 .629 
2778 .659 


9783 
3977 
6504 
3556 
1333 


2781 .689 
2784.719 
2787. 749 
2790. 780 
2793-812 


151 6041 
346 3895 
946 1114 
950 3928 
358 8570 


1284 
8317 


2796. 844. 
2799 .876 386 
2802.909 005 5925 
2805 .942 027 0372 
2808 .975 45° 7927 


171 


II. THE LOGARITHMS OF FACTORIALS 


eee Oe 


log n! 


log n! 


log n! 


2812.009 276 4866 
2815 .043 503 7474 
2818 .078 132 2040 
2821.113 161 4862 


2824.148 591 2244 


2827.184 421 0497 
2830.220 650 5938 
2833.257 279 4891 
2836.294 307 3689 
2839-331 733 8668 


6174 
2558 
4177 
WSS) 
8589 


2842.369 558 
2845.407 781 
2848.446 401 
2851.485 418 
2854.524 832 


2857.564 643 4131 
2860.604 850 0406 
2863.645 452 3807 
2866.686 450 0732 
2869.727 842 7583 


2872.769 630 0773 
2875.811 811 6718 
2878 .854 387 1843 
2881 .897 356 2576 
2884.940 718 5357 
2887.984 473 6626 
2891 .028 621 2835 
2894.073 161 0439 
2897.118 092 Sgo1 
2900.163 415 5688 


2903.209 129 6278 
2906 .25¢ 234 4150 
29D JOT 7795794 
2912.348 614 7702 
2915.395 889 6376 


2918.443 553 8322 
2921.491 607 0053 
2924.540 048 8089 
2927. 588 878 8954 
2930.638 096 9181 


5306 
3876 
1438 
4551 
9775 


2933.687 702 
2936.737 695 
2939. 788 075 
2942.838 841 
2945 .889 993 


3680 
2841 
3837 
3256 
7691 


2948 .941 532 
2951.993 456 
2955.045 765 
2958.098 459 
2961.151 537 


2964.205 000 3741 
2967.258 846 8009 
2970.313 076 7108 
2973-367 689 7653 
2976.422 685 6269 


2979-478 063 9582 
2982.533 824-4229 
2985.589 966 6850 
2988 .646 490 4091 
2991 .703 395 2604 


2994.760 680 9048 
2997-818 347 0087 
3000.876 393 2391 
3003 .934 819 2636 
3006 .993 624 7502 


3010.052 809 3679 
3013.112 372 7858 
3016.172 314 6738 
3019 .232 634 7025 
3022.293 332 5429 


3025.354 407 8665 
3028.415 860 3456 
3031.477 689 6529 
3034-539 895 4617 
3037-602 477 4459 


3040.665 435 2800 
3043-728 768 6390 
3046.792 477 1984 
3049-856 560 6343 
3052.921 018 6236 


8433 
9714 
6861 
6664 


5918 


3055-985 850 
3059.051 056 
3062.116 636 
3065 .182 589 
3068.248 915 


614 
684 
127 


942 
128 


1422 
9982 
8410 
3522 
2139 


3071 .315 
3074. 382 
3°77 -45° 
3080. 517 
3083 . 586 
3086.654 685 10g0 
3089.723 612 7207 
3092.792 gIO 7328 
3095 .862 578 8297 
3098 .932 616 6963 


3102 .003 024 o180 
3105 .073 800 4809 
3108 .144 945 7713 
3111.216 459 5764 
3114.208 341 5837 


3117.360 S91 4813 
3120. 433 208 9579 
3123.506 193 7025 

3126.579 545 4049 
3129-653 263-7553 


3132-727 348 4443 
3135-801 799 1632 
3138 .876 615 6039 
3141-951 797 4585 | 
3145-027 344 4199 


3148.103 256.1814 
3151.179 532 4368 
3154.256 172 8805 
3157-333 177 2072 
3160.410 545 1125 


3163.488 276 2922 
3166. 566 370 4426 
3169.644 827 2606 _ 
3172.723 646 4437 
3175.802 827 6898 


~ 


Ill. THE BINOMIAL COEFFICIENTS, c™ 


no \m=O0|\m=1 |m=2|\m=3| m=4|m=s5 | m=6 | m=7 | m=8 | m=g|m=10 
fe) I I I I I I I I I I I 
I I 2 3 4 5 6 o 8 9 fe) 
2 I 3 6 10 15 21 28 36 45 
3 I 4 10 20 35 56 84 | 120 
4 I 5 15 Rs 70 126 | 210 
5 I 6 21 56 126) |) ago 
6 I wh 28 84 | 2r0 
Jf I 8 36 | 120 
8 I 9 45 
9 I fe) 
Io I 
n mM=11 | m=12|m=13 | m=14 | m=15.| m=16 | m=17 | m=18 | m=19 
° I I I I I I I I I 
I II 12 ny 14 15 16 17 18 19 
2 55 66 78 gi 105 120 136 153 171 
3 165 220 286 364 455 560, | 680 816 969 
4 330 495 70s | col | 1365 18204 2380 | 3060] 3876 
5 462 792 1287 | 2002 | 3003 4368 | 6188 | 8568 | 11628 
6 462 924 1716 | 3003 | S005 8008 | 12376 | 18564 | 27132 
7 1716 | 3432-| 6435 | 11440 | 19448 | 31824 | 50388 
8 6435 | 12870 | 24310 | 43758 | 75582 
9 24310 | 48620 | 92378 
19 92378 
n m=20 | m=21 m= 22 m = 23 m = 24 m = 25 
1 1 1 1 1 1 

I. [220 ora 229, eg 2.4 2.5 

2 |1.902 2.107 2.3t 7 2.537 2.76, 3.007 

3° fe.8go? 1.330% sao" 1.771% 210% 2.300 , 

4 14.845 5.985 7.315 8.855 4 1.0626 1.2650 

5 |x.55044  |2.03494 [2.63344 —|3-3649* — 4.2504 5.3130 


5 
Since CG’ is equal to unity for every value of m it is not tabulated beyond 


m = 20. 


Ill. THE BINOMIAL COEFFICIENTS, C7 


440 

n m=20 | m= 21 m = 22 m = 23 m = 24 m = 26 
6 13.876 0% |c.426 44 |7.461 34 |1-009 47° |1.345 96° |1-771 00° 
7 17.7520 _|1.162 805|1.705 44° |2.451 57 3.461 04 |4-807°00 

8 {1.259 70% |2.034 90 |3.197 70 |4.903 14 7-354 71, [1-081 575 
g |1.679 60 |2.939 30 |4.974 20 |8.171 gO _ |1.307 504 | 2.042 975 
10 11.847 56 13.527 16 [6.466 46 |1.144 066% |1.961 256 |3.268 760 
11 3-527 16 |7.054 32 1.352 078 [2.496 144 | 4.457 400 
12 1.352 078 |2.704 156 |5.200 300 
13 5.200. 300 


eal 


m = 26 m = 27 m = 28 m = 29 m = 30 
2.61 Oy 2.81 o.9e ne 3.01 
3-25 ae ge5r* Be78a 4b * 4.357 
2.600 2.925 3 3.2763 3 6b4 4.0603 
bags 0% “1 irr755'0* | sia cg7 S "is (oye x eleanor 
6.578 0 8.073 0 9.828 0 1,187 66° “| a aac 06" 
2.302 305 | 2.960 10° | 3.767 405 | 4.750 20 5.937 75 
6.578 00 8.880 30 1.184 0408 | 1.560 780% | 2.035 800 
1.562 2758 | 2.220 0758 | 3.108 105 | 4.292 145 _| 5.852 925 
3.124 550 4.686 825 6.906 goo 1.001 5005 ‘| 1.430 7150 
bea Ric 8.436 285 1.312 31107 | 2.003 co10 | 3.004 sors 
7.726 160 | 1.303 78957| 2.147 4180 | 3.459 7290 | 5.462 7300 
9-657 700 _ | 1.738 3860 | 3.042 1755 | 5.189 5935 | 8:649 3285 
1.040 0600 ‘| 2.005 8300 | 3.744 2160 | 6.786 3915 | 1.197 5985 ° 
2.005 8300 | 4.011 6600 | 7.755 8760 | 1.454 2263 
| 7-755 8760 | 1.551 1752 
1 1 1 1 1 
aye. an 3-3 Sd 3a 
4.657, 4.96”, 5.28 2 5.612 ROS 
4-495" , | 4.960% | | 5.456% 5.9843 6.545% 
3.146 5 ? 3.596 0 , | 4092 oe 4.637 64 5.236 of 
1.699 11 2.013 76: 2.373 36° | 2.782 565 | 3.246 325 
7.362 81 ; g.061 92 1.107 568% | 1.344 go4 1.623 1608 
2.629 575° | 3.365 836% | 4.272 048 5.379 616 | 6.724 520 
7-888 725 _ | 1.051 83007) 1.388 41567] 1.815 6204 7| 2.353 58207 
2.016 0075 ‘| 2.804 8800 | 3.856 7100 | 5.245 1256 | 7.060 7460 
4-435 2165 | 6.451 2240 | 9.256 1040 | 1.311 28148] 1.835 79408 


-_ 


Ow onan Mp OD H 


OD OID HNhworny 


—<s*..)  e e Se L  . S 


Hie 


THE BINOMIAL COEFFICIENTS, c” 


441 
—— OT. 
n m = 31 gs es Mm = 33 m = 34 m = 35 
II 8.467 2315 7 1.290 2448 8 | 1.935 36728 | 2.860 9776 ® | 4.172 25908 
12 T.411 20538 | 4. 257 9284 |3.548 1732 5.483 5404 |8.344 eee 
13 2.062 5308 | 3.473 7360 | 5.731 6644 | 9.279 eae 1.476 3378 
14 2.651 8253 14.714 3560 | 8.188 pee 1.391 9756” |2.319 9594 
15 3.005 4020 5.657 2272 1.037 15839 1.855 9675 |3.247 9432 
16 3-005 4020 |6.010 8039 | 1.166 8031 | 2.203 9644 | 4.059 9290 
1.166 8031 - | 2.333 6062 -|4.537 5677 
4.537 5677 
n m = 36 m = 37 m = 38 m = 39 m = 40 
5 )3-6 eo 3.8 3.9) 4.01) 
2 |6.307 6.66 2 7.037 oiane 7.802 
8 AT-MOn., 7-770" 8.436 ° 9-139° ,  |9.880% 
4 5 Sone * 6.604 oe Weta 5 BINS Fi oe, 
Bee (3-709, 92. 2) 4-358 07° y bsvorg a2? 15.757 57 6. 580 08 
6 1.947 792° |2.324 784° | 2.760 681% Ais: 262 623° | 3.838 380° 
7 8.347 680 1.029 5472‘ | 1.262 0256" | 1.538 0937” |1.864 3560 ° 
8 3.026 0340 ‘13.860 8020 | 4.890 3492 _ |6.152 3748 |7.690 Bess) 
9 9-414 aah 1.244 0362 8 | 1.630 1164 2.119 15138 |2.734 3888 ® 
ro [2.541 8686° | 3.483 go1g | 4.727 3376 | 6.357 4540 |8.476 6053 
11 |6.008 0530, |8.549 9215 | | 1.203 3223 9 | 1.676 05609 | 2.311 Borg 9 
12 |1.251 67779 | 1.852 48309 |2.707 4751 | 3.910 7974 | 5.586 8535 - 
13 2.310 7896 | 3.562 4673 | 5.414 9503 | 8.122 4254 ee 1.203 3223 
14 3.796 2972 |6.107 0868 |9.669 5541 i 1.508 4504 °~|2.320 6930 
15 5-567 9026 19.364 1998 4.547 1287 "| 2.514 0841 | 4.022 5345 
16 |7.307 8721 | 1.287 5775 912.223 9974 | 3.771 1261 | 6.285 2102 
17 8.597 4966  |1.590 5369 |2.878 1143 | 5.102 1118 |8.873 2379 
18 9-075 1353 |1.767 2632 |3.357 8001 |6.235 9144 | 1.133 8026 
19 1.767 2632 |3.534 5264 |6.892 3264 |1.312 8241 
20 6.892 3264 |1.378 4653 
n m = 41 m = 42 m = 43 m = 44 m = 45 
1 1 1 1 1 
East 4.2 4-3 4-4 4.5 
2 8.202 8.61 2 9.03 7 t 9.46 2 i 9.907 ; 
3. | 1.066 of 1.148 o# 1.234 1 1.32444, 1.419 08 
H ali.ex2-70° ~ 1.419 Bo%: «)-234 To? 1.397 st 1.489 95°. 
5 7.493 98 8.506 68 9.625 98 1.086 008 1.221 759 
6 6 454° .059 052 _ |8.145 060 
6 | 4.496 388° | 5.245 786° |6.096 454° _ | 7.059 05 4 
7 | 2.248 19407 | 2.697 83287 | 3.222 4114? | 3.832 0568 ° | 4.537 9620 7 
8 9.554 8245 1.180 3019” | 1.450 0851 1.772 3263 - 2.165 $320 
9 13-503 43578 | 4.458 9181 | 5.639 2200, | 7.089 3051 |8.861 6314 | 
to | 1.121 09942 | 1.471 44309 | 1.917 3348% | 2.481 2568 | 3.190 1873 


442 Ill. THE BINOMIAL COEFFICIENTS, C; 
eS eee eee 
n m = 41 m = 42 m = 43 m = 44 m = 45 
11 13.159 46209 | 4.280 56149 | 5.752 00439 | 7.669 33919 |1-015 0596 1° 
12 |7.898 6549_| 1.105 8117 191 1.533 8678 19] 2. 109 0683 19) 2.876 0022 
13. [1.762 0076 192.551 8731 | 3.657 6848 | 5.191 5526 |7.300 6209 
14 _|3.524 0153 | 5.286 0229 | 7.837 8960 | 1.149 5581 14) 1.668 7133 14 
15 6.343 2275 |9.867 2428 |1.515 3266 11) 2.299 1162 13.448 6743 
16 |1.030 7745 !1| 1.665 0972 11/ 2.651 8215 | 4.167 1481 |6.466 2642 
17 1.515 8448 |2.546 6193 | 4.211 7165 |6.863 5380 |1.103 0686 12 
18 |2.021 1264 | 3.536 9712. | 6.083 5905 ~| 1.029 5307 17) 1.715 8845 
19 2.446 6267 | 4.467 7531 8.004 7243 1.408 8315 |2.438 3622 
20 2.691 2894 |5.137 9161 | 9.605 6692 | 1.761 0394 | 3.169 8708 
ar |2.691 2894 | 5.382 5787 | 1.052 0495 17 2.012 6164 | 3.773 6558 
22 1.052 0495 |2.104 0990 |4.116 7154 
23 4:116 7154 
n m = 46 m = 47 m = 48 m = 49 m = 50 
vied geet en peted et ol 
2 —11.035% esse 1.1283 1.1763 Baer 
3 rerso° 1.621 54 1.729 64 1.842 4 1.960 o# 
4 1.631 85° 1.783 65° 1.945 80° 2.118 76° 2.303 00 
5 1.370 754 1.533 939° | 1.712 304 1.906 884° |2.118 760% 
6 |9.366 819 _ | 1.073 75737 | 1.227 15127 | 1.398 38167 | 1.589 07007 
7 5a ct 4680 7 6.289 1499 = 7.362 9072 |8.590 0584 |9.988 4400 
8 | 2.609 32828 | 3.144 57508 | 3.773 4899 | 4.509 7807 § | 5. 368-7865 8 
9 1.101 7163 % | 1.362 6491 1.677 10669 | 2.054 45569 |2. 506 4337 
Io =| 4.076 3504 | 5.178 0668 16.540 7159 |8.217 8225 |1.027 2278 10 
tr | 1.334 0783 191.741 7134 19| 2.259 5200 19 2.913 5916 1913.735 3739 
12 | 3.891 0618 | 5.225 1401 | 6.966 8534 | 9.226 3735 | 1.213 gg65 11 
13. | 1.017 6623 111. 406° 7685 1") 1.929 2825 11) 2.625 9678 1113. 548 6052 
14 12.398 7754 | 3-416 4377 | 4.823 2062 16.752 4887 19.378 4566 
15 | 5.117 3876. | 7.516 1630 | 1.093 2601 17: 1.575 5807 172.250 8296 12 
16 |9.914 9385 |1.503 2326 17) 2.254 8489 |3.348 1090 | 4.923 68 
17 11.749 6950 172.741 1889 | 4.244 4215 | 6.499 2704 bea son 
18 | 2.818 9531 | 4.568 6481 | 7.309 8370 | 1.155 4258 511.806 3629 18 
1g | 4.154 2467 16.973 1998 }-1.154 1848 15/1.885 1685 | 3.040 5943 
20 5.608 2330 | 9.762 4797 | 1.673 5679 2.827 7527 |4.712 9212 
21 16.943 5266 |1.255 1760 18/9.031 4239 =| 3.904 6 
I 9919 732 7446 
22 7.890 3711 1.483 3898 2.738 5657 4-969 9897 A ae 
23 | 8.233.4307 | 1.612 3802 | 3.095 7700 | 5.834 3357 | 1.080 4325 14 
zs 1.612 3802 | 3.224 7604 | 6.320 5303 |1.215 4866 
5 1.264 1061 


6.320 5303 


Il. 


THE BINOMIAL COEFFICIENTS, C7 


n 72°—= §3 m= 52 m= 53 m= o4 m= 55 
1 1 1 1 1 

a4 yaa Lae ai fares ee 

: 132 Tay 1.431 1.485 
2 1 2f08215 * 2.210 of 2.342 6 2.480 4% 2.623 54 
4 12.499 00° 2.707 255 | 2.928 25 Bet62 58) |. eiato xy ° 
5 12.349 o60% | 2.598 960% | 2.869 685% | 3.162 s10% | 3.478 761 6 
6 1.800 94607 | 2.035 85207 | 2.295 74807 | 2.582 71657 |2.898 96757 
7 [4.157 75108 | 1.337 84568 | 1.541 43088 | 1.771 00568 | 2.029 27738 
8 16.367 6305 17.525 3815 | 8.863 2271 | 1.040 46589 | 1.217 56649 
9 |3-042 3124” | 3.679 0754 %| 4.431 6136® | 5.317 9363 |6.358 4021 
to | 1.277 7712 191 1.582 0024 19 1.949 g100 19] 2.393 0713 1% 2.924 8649 19 
11 | 4.762 6017 | 6.040 3729 | 7.622 3753 |9.572 2853. | 1.196 5357 11 
12 11.587 5339 11} 2.063 7941 11) 2.667 8314 11] 3.430 0689 111 4.387 2974 
13 14.762 6017 |6.350 1356 |8.413 9297 | 1.108 1761 12/1. 451 1830 1? 
14 | 1.292 7062 17| 1.768 9663 1°] 2.403 9799 '7| 3.245 3729 | 4.353 549° 
15 |3.188 6752 | 4.481 3814 |6.250 3478 | 8.654 3277 |1.189 g7or 13 
16 17-174 5193 __| 1.036 3195 13) 1.484 4576 13/2. 109 4924 13}2.974 9251 
17 11.477 1069 13} 2.194 5588 |3.230 8783 | 4.715 3359 |6.824 8282 
18  |2.790 0908 | 4.267 1977. |6.461 7566 |9.692 6349 |1.440 7971 14 
19 14.845 9472 | 7.636 0381 | 1.190 3236 14/ 1.836 4992 14/2.805 7627 
20 17.753 5156 | 1.259 9463 *4|2.023 ssor | 3.213 8737 | 5.050 3729 
21 1.144 5666 14/ 1.919 9181 | 3.179 8644 | 5.203 4145 |8.417 2882 
22 |1.560 7726 | 2.705 3392 | 4.625 2573 [7.805 1218 |1.300 853615 
23 11.967 9307 | 3-528 7033 |6.234 0425 | 1.085 9300 151.866 4422 
24 2.295 9191 | 4.263 8498 | 7.792 5531 |1.402 6596 |2.488 5895 
25 2.479 5927 |4-775 5118 |9-039 3616 |1.683 1915 | 3.085 8510 
26 12.479 5927 | 4-959 1853 |9.734 6971 | 1.877 4059 | 3.560 5973 
27 9-734 6971 |1.946 9394 | 3-824 3453 
28 3-824 3453 
n m = 56 m = 57 m = 58 m = 59 m = 60 
I Te Se OS te 5.97 6.6) 
2 8. 1.5962 653 * 1.7113 1.770 3 
2 2.772 o* 2.926 o# 3.085 64 3-250 9 3.422, 10 
4 3-672 90° |3.950 10° | 4.242 70” | 4.551 26°. | 4.876 35°, 
5 |3-819 816© | 4.187 106% | 4.582 116% | 5.006 386% | 5.461 512 
6 | 3.246 84367 | 3.628 82527 4.047 5358" | 4.505 74747 | 5.006 3860 7 
7 2.319 1740" |:2.643 8584 8 | 3.006 TASS 3.411 4945 ° | 3.862 0692 & 
8 | 1.420 4941? | 1.652 41159 | 1.916 7973 nol 2227 4714 sgl 2 558 6208 9 
9 lesley opts 8.996 cent 1.064 8874 19 1.256 5671 1% 1.478 3143 1 
10 3.560 7051 19| 4.318 3020 19] 5.217 9482 | 6.282 8356 17.539 4028 


Se ee eee 


444 Ill. THE BINOMIAL COEFFICIENTS, C; 

a ee ee ee eee 
n m = 56 m= 57 m = 58 m = 59 m = 60 
11 | 1.489 0222 11/1.846 0927 14 2.276 9229 12.798 7177 113.427 o013 1! 
12 15.583 8331 __|7.072 8552 | 8.917 9479 __| 1.119 4871 711.399 3588 1? 
13. | 1.889 9127 12/2.448 2960 17) 3.155 5816 17] 4.047 3764 15.166 8634 
14 | 5.804 7320. _|7.694 6447 || 1.014 2941 191 1.329 8522 111.734 5899 ' 
1g | 1.625 3249 13) 2.205 7981 13} 2.975 2626 | 3.989 5567 | 5-319 4089 
16 | 4.164 8952 |§.790 2201 | 7.996 0183 _| 1.097 1281 14/1. 496 0838 14 
I i Dy 464, Tis 48 Di 088 .872 2168 
7 19-799 7534 _,| 1-396 4649 4) 1.975 4869 1) 2.775 0887 | 3.872 216 
18 2.123 2799 *| 3.103 2552 | 4.499 7201. |6.475 2070 |9.250 2957 
19 | 4.246 5598 | 6.369 8397_ 19-473 0949. -|1.397 2815 1°) 2.044 8022 1° 
20 7.856 1356 | 1.210 2695 15/7847 2535 15|2.794 5630 | 4.191 8445 
ar 11.346 7661 15| 2.132 3797. | 3.342 6492 | 5.189 9027 |7.984 4657 
22 2.142 §824 ~| 3.489 3485 |5.621 7282 |8.964 3774 |1.415 4280 16 
23 | 3.167 2958 | 5.309 8782 |8.799 2268 | 1.442 0955 192.338 5332 
24 4.5355 O317. “1'7.$22, 3275. | 1.2830 2206 16) 9.163 1432 |3.605 2387 
25 5-574 4406 19.929 4723 |1.745 1800 |3.028 4oo5 | 5.191 5438 
26 6.646 4484 |1.222 0889 1% 0.216 0361 |3.960 2161 |6.988 6167 
27 73364 194276 shed) T3200 2.625 2280 |4.840 2641 |8.800 4802 
28 |7.648 6906 | 1.503 3633 |2.906-5024 | 5.531 7304 [1.037 1995 17 
2 1.503 3633 |3.006 7266 | 5.913 2291 |1.149 4960 
30 5.913 2291 {1.182 6458 
ig m = 61 m = 62 m = 63 m = 64 m = 65 
I ex + 6-04 6.31 64° 6. 

2 1.830% 12861" 1.9532 2.016 3 2.0803 

3. 13.599 of 3.782 of 3.971 1 4.166 44 4.368 of 

4 [5.218 55°. [5.578 45°. | 5.956 65° |6.353 76° |6.770 40° 

§ 15-949 147° 16.471 002° -|'7.028 847° [4.624 612% 18 arp gee 
6 | 5.552 53727 |6.147 45197 |6.794 55217 | 7.497 43687 |8.259 88807 
7 14.362 70788 | 4.917 9615 8 | 5.532 70678 | 6.212 16198 |6.961 9056 8 
8 [2.944 8278 7 3.381 0985 ® 3.872 89479 | 4.426 16549 | 5.047 38169 
9 | 1.734 1764 1° 2.028 6591 > 2.366 7690 ee 2.754 058519 3.196 6750 10 
10 |9.017 7170 | 1.075 1893 1) 1.278 0553 411.514 7321141 1.790 13801! 
11 [4.180 9415 14) 5.082 7132 |6.157 9026 |7.4 8 |8.950 6900 
12 |1.742 0590 17) 2.160 1531 17] 2.668 4244 12 ee oe i ee Sor Se 
13 16.566 2223 |8.308 2812 | 1.046 8434 19] 1.313 6859 15] 1.642 1074 13 
14 [2.251 2762 13)2.907 8984 13] 3.738 7266 | 4.785 5700 |6.099 2559 
15 |7.053 9988 | 9.305 2750 | 1.221 3173 1411. 595 1900 !4/9.073 7470 14 
16 | 2.028 0247 14) 2.733 4245 141 3.663 9520 | 4.885 2694 |6.480 4594 
17 [5.368 3005 | 7.396 3252 | 1.012 9750 19) 1.379 3702 151.867 8g71 15 
18 1.312 251219) 1.849 0813 45] 2.588 7138 | 3.601 6888 4.981 o590 
19 2.969 8318 | 4.282 0830 [6.131 1643 | 8.719 8781 _ | 1.232 1567 +8 
20 16.236 6467 | 9.206 4785 [1.348 8561 16] 1.961 9726 162.833 g604 


II. THE BINOMIAL COEFFICIENTS, Cc” 


445 


n m = 61 m = 62. m = 63 m = 64 m= 65 
21 1.217 6310 181 1.841 2957 16 2.761 9435 16| 4.110 7997 1§|6.072 7723 16 
22 2.213 8746 | 3.431 5056 5.272 8013 8.034 7448 |1.214 geant 
23 13-753 9613 | 5.967 8358 | 9.399 3415 | 1.467 2143 17|2.270 6888 
24 | 5-943 7729 |9.697 7332 __| 1.566 5569171 2.506 agit | 3.973 7053 
25 8.796 7825 | 1.474 0555 17/ 2.443 8288 | 4.010 3857 |6.516 8767 
26 | 1.218 01601”) 2.097 6943 | 3.571 7498 | 6.015 5785 [1.002 5964 18 
27 1.578 9097 12.796 9257 | 4.894 6200 | 8.466 3698 |1.448 1948 
28 ,|1.917 2475 | 3.496 1572 | 6.293 0829 | 1.118 7703 181.965 4073 
29 2.101 6954 | 4.098 9429 |7.595 1000 |1.388 8183 |2.507 5886 
*30 2.327 1418 | 4.508 8372 | 8.607 7801 |1.620 2880 | 3.009 1063 
31 [2 527 1418 14.654 2835 | 9.163 1207 |1.777 ogor 13.397 3781 
32 g.163 1207 | 1.832 6241 |3.609 7142 
33 3.609 7142 


n m = 66 m = 67 m = 68 m = 69 m = 70 

i 16.6" 6.7" 6.81 6.9) 4.0} 
betes Peet 2.278 3 2.346? 2.416% 

3 14:576.0°. |4.foo5* |5.o1r 6* | 5.239°4 5.474 04 

4 Fis207/20 7.664 805 8.143 85° 8.645 or 5  |9.168 95 ® 

5 8.936 928° | 9.657 6486 |1.042 41287 | 1.123 85137 | 1.210 30147 
6 |9.085 87687 | 9.979 56967 |1.094 5334° [1.198 747° |1.311 15998 
7 17-787 89448 | 8.696 4821 ® | 9.694 4390 | 1.078 8972° |1.198 7747 ® 
8 15.743 57219 16.522 36169 | 7.392 0098 9 | 8.361 4537 1 9442 3509 

9 |3-701 4131 114.275 7704 19| 4.928 o065 11) 5.667 2075 16.503 3529 10 
10 | 2.109 8055 14) 2.479 9468 11) 2.907 5238 11] 3.400 3245 113.967 0452 1 
tr | 1.074 0828 12) 1.286 0633 12] 1.533 0580 12] 1.823 8104 17/2.163 8429 12 
12 14.922 8795 .|5.996 9623 _|7.282 0256 |8.815 0836 ee 1.063 8894 18 
13 | 2.044 8884 18! 2.537 1763 13) 3.136 8726 19] 3.865 0751 [4-746 5835 - 
14 17-741 3632 |9.786 2516 [1.232 3428 14] 1.546 ogor 14/1.932 5376 
15 12.683 6726 14/3.457 8089 14] 4.436 4341 | 5.668 7769 |7.214 8069 
16 |8.554 2064 |1.123 7879 15) 1.469 5688 15|1.913 2122 19/2. 480 0899 15 
17 12.515 94301°|3.371 3637} 4.495 1516 || 5.964 7204 | 17-877 9326, 
18 6.848 9561 |9.364 8991 _] 1.273 6263 °°/ 1.723 1414 °°} 2.319 6135 
19 |1.730 2626 182.415 158219) 3.351 6481 | 4.625 2744 re 6.348 4158 
20 |4.066 1171 |5.796 3797 | 8.211 5379 | 1-156 3186 ©") 1.618 8460 
ar |8.906 7327 | 1.297 2850 !7|1.876 9229 172.698 0767 | 3.854 3953 
22 as Sel 17) 9.912 5049 | 4.009 7899 | 5.886 7129 | 8.584 7896 - 
23 3-485 2432 15.307 0749 |8.019 5798 | 1.202 9370 "1.791 6083 
24 6.244 3941 19.729 6373 __| 1.503 6712 °°| 2.305 6292 | 3.508 5662 
25 | 1.049 0582 18] 1.673 4976 18) 2.646 4613 | 4.150 1326 | 6.455 7618 


a EE Eee 


m 

446 III. THE BINOMIAL COEFFICIENTS, C, 

oo eS aS eee 
n m = 66 m = 67 m = 68 m = 69 m = 70 
26 | 1.654 2841 182.703 3234 18] 4.376 8399 18] 7.023 301g 181.117 343419 
27 2.450 7913 | 4.105 0753 | 6.808 4177 1.118 §258 19/1820 8559 
28 3.413 6021 5.864 3934 | 9-969 4687 ¥ 1.677 7886 |2.796 3174 
29 | 4.472 9959 | 7.886 5980 | 1.375 0991 **| 2.372 0460 | 4.049 8346 
30 [5.516 6949 | 9.989 6908 =| 1.787 6289 | 3.162 7280 | 5.534 7740 
31 | 6.406 4844 | 1.192 3179 }9| 2.191 2870 | 3.978 9159 17.141 6438 
32 | 7.007 0923 | 1.341 3577 | 2-533 6756 | 4-724 9626 | 8.703 -8785 
33 7.219 4284 |1.422 6521 | 2.764 0097 | 5.297 6853 | 1.002 2648 
34 1.422 6521 | 2.845 3041 | 5.609 3139 | 1.040 6999 
35 5.609 3139 |1.121 86286 
n m= 71 m= 72 m= 73 m = 74 m= 75 

1 1 1 1 1 
a He es 2 Gee sy a6 
2 largss 2.556% 2.628 3 2.701 3 2.4753 
oie ee cue 5.964 o* 6.219 64 6.482 44 6.752054 
4 9.716 35° 1.028 7908 | 1.088 4308 |1.150 626% |1.215 450% 
5 1.301 9909? | 1.399 144 |1.502 0334’ | 1.610 87647 1.725 9390 * 
6 1.432 1900 § 1.562 3891 © 1.702 3045 8 | 1.852 5079® |2.013 5955 6 
7 | 1.329 8907" 11.473 1097* | 1.629 3486“ | 1.799 5791 °° |1.984 8299 
8 11.063 9126 291.196 gor16 1% 1.344 2126 29% 1.507 1475 191.687 1054 1° 
g |7-447 3879 __|8.s11 3005 |9.708 2021 | 1.105 2415 “11.066 gs6a44 
10 | 4.617 3805 111 5.362 1193 11]6.213 2494 14) 7.184 0696 | 8.289 3111 
11 | 2.560 5474 a 3.022 2854 3.558 4974 4.179 8223 - 4.898 2293 3 
12 1.280 2737 °°| 1.536 3284 °°) 1.838 5570 °°| 2.194 4067 -”| 2672-3889 
13. | 5.810 4729 _ |7.090 7466. |8.627 0750 | 1.046 5632 14/1266 0039 !4 
14 | 2.407 1959 !4/ 2.988 2432 ne 3.697 3179 _ 4.560 0254 13] 5-506 5886 a 
15 9-147 3445 1.155 4540 - 1.454 2784 1.824 OIOI ~*| 2.280 0127 
16 | 3.201 570615] 4.116 3050 | 5.271 7591 _|6.726 0374 |8.550 0476 
17 | 1.035 8022 18! 1.355 9593 19| 1.767 5898 182.294 7657 82.967 3695 16 
18 3.107 4067 | 4.143 2090 | 5.499 1683 | 7.266 7581 _ |9.561 5238 
19 | 8,668 0293) 1.177 5436 17 7.sg1 8645 1”) 2.141 781347] 2.868 4571 
20 =| 2.253 6876 *"| 3.120 4906 =| 4.298 0342 | 5.889 8987 | 8.031 6800 
at | 5.473 2414 | 7.726 g2g0__|1.084 742018) 1.514 5454 182.103 5352 18 
Gp) 1.243 9185 18} 1.791 2426 18! 9.563 9355 3.648 6775 | 5.163 2228 
23 | 2.650 0872 | 3.894 0057 | 5.685 2483 |8.249 1839 |1.189 7861-19 
24 | 5.300 1744 17.950 2617 o| 1-184 4267 191 1.752 9516 190.577 8700 
25 19.964 3279 | 1.526 4502 °*| 2.321 4764 | 3.505 9031 | 5.258 8547 
26 | 1.762 9196 19) 2.759.3524 | 4.285 8026 |6.607 2790 _|1.011 3182 20 
27 | 2.938 1993 | 4.701 1188 | 7.460 4712 | 1.174 6274 2% 1.835 3553 
28 [4.617 1703 [7.555 3695 | 1.225 6488 2% 1.971 6960 | 3.146 3233 
29 | 6.846 1490 | 1.146 3319 2°] 1.901 8689 | 3.127 5177 | 5.099 2137 
30 | 9-584 6086 | 1.643 0758 | 2.789 4077 | 4.691 2766 | 7.818 7943 


Ill. 


THE BINOMIAL COEFFICIENTS, c” 


447 


n m= 71 m= 72 m=m93* | m4 m= 7 
31 | 1.267 6418 79| 2.226 1027 20) 3.869 1784 296.658 5861 291.134 9863 22 
Sy) 1.584 $522 | 2.852 1940 | 5.078 2967 |8.947 4751 |1.560 6061 
33 | 1-872 6526 | 3.457 2049 | 6.309 3989 | 1.138 7696 24) 2.033 si71 
34 2.092 9647 | 3.965 6174 [7.422 8222 [1.373 2221 | 2.511 9917 
35 2.212 5627 | 4.305 5274 | 8.271 1448 |1.569 3967 |2.942 6188 
36 2.212 5627 | 4.425 1254 | 8.730 6528 |1.700 1798 |3.269 5765 
37 8.730 6528 | 1.746 1306 |3.446 3103 
38 3-446 3103 
n m= 76 m= 77 m = 78 m = 79 m = 80 

Iya) 7.6" apt 7.81 eee 8.01 

2. |2.8r6* 2.926% 3.003 3.081 2 3.1603 

3 7.030 O ye Nao Na 7.607 64 7.907 9* 8.216 of 

4 282 975 1.353 275 1.426 425° [1.502 sor® |1.581 5806 
g [1.847 48407 | 1.975 8157 | 2.111 10907 | 2.253 75157 |2. 404 C016 

6 | 2.186 1894% | 2.370 93788 | 2.568 51608 | 2.779 62698 | 3.005 00208 
7 [2.186 18949 | 2.404 80839 |2.641 goat? | 2.898 75379 13.176 71649 

8 | 1.885 5884 19 2.104 2073 19) 2.344 6881 112.608 8783 1% 2.898 7537 10 
g |1.424 6668 11! 1.613 2256 11) 1.823 6463 14) 2.058 1151 110. 319 C030 
to 19.545 2673 | 1.096 9934 17] 1.258 3160 17) 1.440 6806 17/1.646 4gat 12 
rr | 5.727 1604 116.681 6871 13| 7°78 6805 13/9035 9965 wa) L047 7877 13 
12 | 3.102 2119 13/ 3.674 9279 19) 4.343 0966 19) 5.120 9647 1916.024 6643 
13. | 1.527 2428 14/1.837 4640 142.204 9567 14| 2.639 2664 14) 3.151 3629 14 
14 |6.872 5924 -|8.399 8352 | 1.023 7299 15] 1.244 2256 151. 508 152215 
15 | 2.840 6715 19) 3.527 9308 }°) 4.367 9143 | 5.392 5442 [6.635 8698 
16 | 1.083 0060 18] 1. 367 0732 16 1.719 8663 19) 2.156 6577 18 2.695 8221 16 
17 | 3.822 3742 _|4.905 3802 | 6.272 4534 | |7.992 3197 __|1.014 897717 
18 | 1.252 8893 17| 1.635 1267 !7| 2.125 6648 17) 2.752 gtor 17/3. 552 1421 
19 13.824 6095 |5.077 4988 |6.712 6256 | 8.838 2904 | 1.159 1200 18 
20 | 1.090 0137 18] 1.472 4747 18| 1.980 2245 18) 2.651 4871 181 3.535 3161 
21 |2.906 7032 |3.996 7169 | 5.469 1916 | 7.449 4162 | 1.010 0903 19 
22 17/266 7581 | 1.017 3461.9] 1.417 0178 19] 1.963 9370 19|2.708 8786 
23 {1.706 1084 19} 2.432 7842 |3.450 1304 | 4.867 1482 Fe 6.831 0852 o 
24 13.767 6561 | 5.473 7645 : 7.906 5487 Be 1.135 6679 “| 1.622 3827 
25 7.836 7247 |1.160 4381 “"| 1.707 8145 “"| 2.498 4694 | 3.634 1373 
26 | 1.537 2037 29) 2.320 8762 | 3.481 3142 | 5.189 1288 | 7.687 5982 
27 | 2.846 6735 | 4.383 8772 | 6.704 7533. | 1.018 60687111. 537 5196 1 
28 |4.981 6786 | 7.828 3521 | 1.221 222971|1.891 6983 | 2.910 3050 
a9 18.245 5370 __|1.322 721671|2.105 5568 | 3.326 7797 | 5.218 4780 
30 | 1.291 8008 24) 2.116 3545 | 3.439 0761 | 5-544 6328 8.871 4125 


nn ee een EE NnnERERENEEEEREEURGEEEEEEERENEEEEEEneend 


™ 

448 Ill. THE BINOMIAL COEFFICIENTS, C; 

Si ee Se See 
n m = 76 m= '77 m = 78 m = '79 m = 80 
31 | 1.916 8657 21) 3.208 6665 21] 5.325 021071) 8.764 0971 21) 1.430 8730 Ag 
32 2.695 5924 | 4.612 4581 | 7.821 1246 | 1.314 6146 “| 2.191 0243 
33. | 3-594 1232 | 6.289 7156 | 1.090. 2174 97| 1.872 3298 | 3.186 9444 
34 4.545 5087 | 8.139 6319 1.442 9348 2.533 1521 |4.405 4819 
35 5.454 6105 | 1.000 O1Ig 211.813 9751 | 3-256 9099 | 5-790 0620 
36 6.212 1953 | 1.166 6806 |2.166 6925 | 3.980 6676 | 7.237 5775 
37 6.715 8868 | 1.292 8082 | 2.459 4888 | 4.626 1813 | 8.606 8489 
38 6.892 6206 |1.360 8507 | 2.653 6589-1 5.113 1477 | 9-739 3290 oe 
39 1.360 8507 |2.721 7015 | 5.375 3604 i 048 8508 
40 5-375 3604 | 1.075 0721 

oe 
n m = 81 m = 82 m = 83 m = 84 m = 85 
eo ies te 8.21 $232 8.42 8.57, 

2 | 3.2408 q2g2t 3 3.403% 13-486" ice. 

a 8.532 of 8.856 of 9.188 I 9.528 44 9.877 0 

4. | 1.663 740% |1.749 0608 | 1.837~6208 | 1.929 5018 _} 2.024 785° 

5 2.562 15967 | 2.728 5336 7 | 2.903 4396” | 3.087 2016" | 3.280 15177 

6 |3.245 40228 | 3. sor 6181 ® | 3.774 4715 © | 4.064 8154 ° | 4.373 53568 

Te 41) 2 Pe Oe 1 4.151 9186” | 4.529 3658" | 4.935 8473 

Boo [3-216 4254. || 3.554 1470 1 | 3.944 322721 4-359 SEA. ee aa 

9 2.608 8783 1 Bese 5209 111 3.286 9356 11) 3.681 3679 14) 4.117 3193 1 
10 | 1.878 3924 !2| 2.139 2802 17] 2.432 3323 12] 2.761 0259 17) 3.129 5193 9 
11 [1.212 4169 13} 1.400 2562 13} 1.614 1842 19) 1.867 4174 19} 2.133 5200 18 
12 17.072 4320, || 8.284 8489 9.685 1051 a] 1-229 9289 14) 5 315 6707 = 
TZ: 13.753 8200), | 40408 C7250 515 1289 5741 | Os? 0b70 Aiea yeas 
14 | 1.823 2885 °°) 2.198 6714 "| 2.644 7787 ial Sd 2344 | gl 3-200 eee 
15 18.144 0220 | 9.967 3106 | 1.216 5982 °°) 1.481 0761 *"| 1.798 4495 
16 | 3.359 4091 19) 4.173 8113 19] 5.170 5424 | 6.387 1406 |7.868 2166 
17 11.284 4799 1”| 1.620 4206 17) 2.037 802017] 2.554 8562 171 3.193 5703 17 
18 | 4.567 0398 |5.851 5198 17.471 9406 | 9.509 7426 | 1.206 4599 1 

1 8 18 18 

19 |'T.514 3343 "| 1.971 0382 °°/ 2.556 1902 °°] 3.303 3843 714.254 3585 
20 | 4.694 4362 | 6.208 7704 | 8.179 8087 | 1.073 5999 !% 1.403 9383 19 
21 | 1.363 6219 19) 1.833 0656 19} 2.453 9426 | 3.071 9235 | 4.345 5234 
22 | 3.718 9689 | 5.082 5909 ., 6.915: 6564 | 9.369 5990 | 1.264 1523 
23. [9.539 9638 | 1.325 893379) 1.834 15242 2.525 718079 3. 462 6779 
24 | 2.305 4912 “| 3.259 4876 | 4.585 3809 | 6.419 5333 | 8.945 2513 
25 | 5.256 5200 | 7.562 0113 | 1.082 1499 21) 1540 6880 2} 9.782 6413 at 
26 | 1.132 1735 21) 1.657 8256 21] 0.414 0267 | 3.496 1766 | 5.036 8646 
ay 2.306 2794 | 3.438 4530 | 5.096 2785 |7.510 3052 |1.100 6482 22 
28 «14.447 8246 | 6.754 1041 | 1-019. 2557 22) 1.528 8836 27) 2.279 gt4i 
29 8.128 menos 1.267 6608 2 2) 1.933 0712 | 2. 952 3269 | 4.481 2104 
30 | 1.408 9890 77] 2.221 8673 | 3.479 5281 5-412 §993 | 8.364 9262 


L 


III. 


THE BINOMIAL COEFFICIENTS, c” 


449 


n m = 81 m = 82 m = 83 m = 84 m = 85 
31 | 2.318 0142 22) 3.727 0032 22| 5.948 8706 22) 9.428 3988 22/1484 0998 23 
32 | 3-621 8973 | 5.939 9115 | 9.666 9148 | 1.561 5785 23/2. 504 4184 
33 | 5-377 9687 | 8.999 8659 | 1.493 9777 73| 2.460 6692 | 4.022 2478 
34 7-592 4263 1.397 0395 73 2.197 0261 | 3.691 0038 |6.151 6730 
35 | 1.019 554479) 1.778 7970 | 3.075 8365 | 5.272 8626 |8.963 8664 
36 | 1.302 7639 | 2.322 3183 | 4.101 1154 | 7.176 9519 | 1.244 9815 24 
37 1.584 4426 | 2.887 2066 | 5.209 5249 |9.310 6403 |1.648 7592 
38 1.834 6178 | 3.419 0604 |6.306 2670 | 1.151 5792 2412.08 6432 
39 2.022 7837 | 3-857 4015 | 7.276 4619 | 1.358 2729 | 2.509 8520 
40 2.123 9229 | 4.146 7066 | 8.004 1081 | 1.528 0570 | 2.886 3299 
41 2.123 9229 | 4.247 8458 | 8.394 5524 |1.639 8661 | 3.167 9231 
42 -394 5524 [1.678 gt05 | 3.318 7765 
43 3 318 7765 


n m = 86 m = 87 m = 88 m = 89 m = gO 
I 8.61 Siz B28.) ee GO 
a- 13.6562 3.74r° 3.8283 Bote) ee ila coGe : 
3 1.023 40 1.059 95° 1.097 36° 1.135 64 is 1.174 80 : 
4 |2.123 555% |2.225 8958 | 2.331 890 e 2.441 626° | 2.555 190 : 
§ | 3-482 63027 | 3.694 98577 | 3.917 57527 | 4.150 76427 | 4.394 9268 
1388 . 699 8 |6.226 1463 8 
6 4.701 55088 | 5.049 81382 | 5.419 31248 | 5.811 0699 : 463 
7 15.373 20099 | 5.843 35609 16.348 33732 |6.890 26869 |7.471 3756 9 
8 15.306 0359 19] 5.843 3560 1916. 427 6916 19} 7.062 5253 : 7.751 5521 10 
g 14.598 5644 !1| 6.129 168011) 6 713 5036 1116.356 2728 ie 7.062 5253 : 
10 | 3.540 8946 12} 4.000 7510 !7| 4.613 6678 !7| 5.085 0182 17) 5.720 6455 
Ir | 2.446 4363 13) 2.800 5257 13) 3.200 6008 - 3.651 9676 a 4.160 4694 a 
12 |1.529 0227 14) 1.773 6663 1%|2.053 7189 | 2.373 7790 72.738 9757 1° 
13. |8.703 6675 | 1.023 2690 !5| 1.200 6356 511.406 0075 15) 1.643 3854, 
14 14.538 3409 19| 5.408 7077 | |6-431 9767 | 7-632 6123 | |9.038 6199 
15 2.178 4036 1©| 2.632 2377 °°) 3.173 1085 °°} 3.816 3062 °°) 4.579 5674 
16 |9.666 6661 | 1-184. 5070 17) 1.447 7308 Tat 76 ae 17) 9.146 6722 17 
I .980 3919 17| 4.947 0586 |6.131 5655 17.579 2963 19.344 3379 
18 ee te) 1.923 8561 18) 2.418 5620 !8| 3.031 7185 3.789 6481 + 
19 5.460 8184 |6.986 6353 | 8.910 4914 - 1.132 9053 "|1.436 0772 
20 |1.829 3742 19) 2.375 4560 19| 3.074 1195 19] 3.965 1687 | 5.098 0740 
.302 8411 29 1.699 3580 2° 
21 -749 4617 _ | 7.578 8358 |9.954 2919 | | 1.302 84 
22 1 a 7046 29] 2.273 6508 79) 3.031 5343 791 4.026 9635 | 5.329 8047 " 
8.699 18 1.173 6720 7111.575 7683 
23 | 4.726 8302 |6.425 5347 ee ie 
24  |1.240 7929 21| 1.713 4759 71| 2.356 0294 ~"| 3.225 9480 | 4.399 0199 ys 
25  |3.077 1664 | 4.317 9593 [6.031 4353 | 8.387 4647 | 1.161 3413 


4to. ‘II. THE BINOMIAL COEFFICIENTS, Cy 

eS ee ee 
n m = 86 m = 87 m = 88 m = 89 m = go 
26 |7.219 505g >) 1.029 6672 22| 1.461 4632 27| 2.064 6067 2%| 2.903 3532 7” 
bi) 1.604 3346 27] 2.326 2852 13.355 9524 | 4.817 4156 se 6.882 0223 - 
28 3.380 5623 | 4.984 8969 Sars 1821 A 1.066 7135 ~"|1.548 4550 
29 6.761 1245 |1.014 1687 “9| 1.512 6584 “| 2.243 7766 |3.310 4900 
30 |1.284 6137 79) 1.960 7261 | 2.974 8948 | 4.487 5532 6.731 3297 
31 2.320 5924 | 3.605 2061 | 5.565 9322 | 8.540 8270 , | 1.302 8380 mabe 
32 =| 3.988 5182 | 6.309 1106 jako 24 3167 $5 1.548 0249 ~*|2.402 1076 
33 6.526 6662 Si 1.061 5184 “*| 1.682 4295 “*| 2.673 8612 |4.221 8861 
34 1.017 3921 74| 1.670 0587 | 2.721 5771 ~}4.404 0066 | 7.077 8678 be 
35 1.S1I 5539 | 2.528 g460 | 4.199 0047 |6.920 5819 | 1.132 4589 
36 2.141 3681 | 3.652 9220. |6.181 8681 1.038 0873 25] 1.730 1455 
37 2.893 7407 -| 5.035 1088 | 8.688 0308 a 1.486 9899 |2.525 0772 
38 3.731 4024 16.625 1431 1.166 0252 “| 2.034 8283 |3.521 8182 
39 14-592 4952 | 8.323 8977_ | 1.494 9041 | 2.660 9293 | 4.695 7575 
40 §.396 1819 | 9.988 6771 | 1.831 2575 | 3.326 1616 | 5.987 0908 
41 6.054 2530 | 1.145 0435 75} 2.143 9112 | 3.975 1687 | 7.301 3302 
42 6.486 6996 | 1.254 0953 | 2.399 1387 | 4-543 0499 [8.518 2186 
43 16.637 5531 | 1.312 4253 | 2.566-5205 | 4.965 6593 | 9.508 7092 
44 Tet? 4259 24624 8506 9 | ch iokes7i ke Si eo ge7oso 
45 5.191 3711 | 1.038 2742 
n m = gl m = G2 m = 93 m = 94. m = 95 

1 1 al 1 1 
ae 9-2 9-3 9-4 9-5 
2 4.095% : 4.186 3 : 4.2783 4.371° re Ty 
3 [1.214 85° 1.255 80 1.297 66°. 1.340 44°) [nig8h ase 
4 | 2.672 670", | 2.794 155", |2.919 735°, |3.049 sor | 3.183 545° 
5 4.650 4458" | 4.917 7128" | 5.197 12837 | 5.489 1018 | 5.794 O519 7 
6 16.665 63908 | 7.130 68368 | 7.622 45488 | 8.142 16778 8.691 0779 8 
8 9 9 10 
7 093 99029 8.760 5541 | 9°473 62249 | 1.023 58681%1.105 co8s } 
8 | 8.498 6897 1 9.308 0887 1 1.018 4144! 1.113 150614 1.915 5093 1! 
9 | 7-837 6805 ° | 8.687 s49s °°] 9-618 3583] 1.063 6773,7|1.174 9923 12 
10 | 6.426 8980 '7| 7.210 6661 17! 8.079 4210 17! 9.041 2569 | 1.010 4934 13 
rr | 4.732 5340 13] 5.375 2238 136.096 2904 13] 6.904 2325 1317-808 3682 
12 | 3.155 0227 ee 3.628 2761 14) 4.165 7984 14] 4.775 pe 5 tab ee = 
13 | 1.917 2830 1°) 2.232 7853 # 2.595 6129 15) 3.012 1927 15/3. 489 7355 15 
14 | 1.068 2005 19) 1.259 9288 18 1.483 2074 19] 1.742 7686 190.043 9879 } 
15 15.483 4294 |6.551 6299 17.811 5587 | 9.294 7661 |1.103 7535 17 
16 | 2.604 629017! 3.152 9719 !7| 3.808 1349 17| 4.58 17; 
2 : 589 2908 518 76’ 

17 [1.149 1010 18) 1.409 5639 18 1.724 8611 18) 2.16% 6746 18 ore a 18 ~ 
18 | 4.724 0819 19/5873 1829 ol 7:282 7468 |9.007 6079. |1.111 3283 19 
19 | 1.815 042079! 2.287 4502 19] 2.874 7685 19] 3.603 0432 194.503 Bo4o 
20 16.534 1512 | 8.349 1932 | 1.063 6643 7% 1.351 1412 2% a aI1 4au5~ 


ii. 


a 
HE BINOMIAL COEFFICIENTS, C7? 


Pee 
451 
n rie f l 
gt m= 
92 m= 
; 93 = 
a 2.209 1654 20 2.862 5805 20 6 e sed ee 
fread faa gat | SISAL TS easel ws gst 
: 2.811 66cr 2! -579 8408 “*|2 2 
24 15.974 788 8 : 3-735 4979 “O55 9573 °" 
as |t.601 cae 99 a atk 4| 1-089 5202 22 — fe - B. 525 4296 
: 3-007 0758 |4.096 5960 “957 6289 
2 4.064 6944 66 b- 589 see 
27 W9.785 3755 : ~ 9377 oe 7.864 6598 | 1.087 1736 73 
a8 | 2.296 6672 23 .385 0070 73) 1.951 6008 23} 2 736 73| 1.496 8332 23 
86 oe $7 3.218 1948 | 4.600 2018 738 0667 | 3.825 2403 
30 4.858 9451 7.095 6023 1.0 24 6.551 8025 |9.289 86 
1.004 1820 24] 1.490 0765 24 ane nee 1.491 0999 74) 2.146 sey 24 
31 11.975 9710 | 2.98 eat ate oy 
.980 1530 
ie else a ie eee ce, 
34 an oe 5 1.032 8939 1.600 9846 25 wie Haig 22-99 2105 °° 
: re ; S 
Be fig gn) Ime sk | tas ay cb ae. | bs 
36 | 2.862 6 a Eig 1201 hike 
dl - 2 6043 |4.702 8500 | 7.673 0710 
et care ie ee LS iene he tucege 
39 8. 46 8953 | 1.030 2118 76/1 741 TY iG 2 949 3748 og 292 9415 
.217 5757 | 1.426 4471 ce 9945 |2.924 0622 |4.873 
40 | 1.068 2848 76) 1.890 0424 BY: Goda) 208 C594 17.182 hee 
41 | 1.328 8 3-316 4895 15.773 1484 19-97! for8 
: 421 2.397 126 
43 Eg pees 7970 se ae 7-603 6589 1.337 6807 *7 
4 g2 3.384 6 ‘ 9-595 2934 ae 
44 | 1-966 5740 ae ay 6.295 4447 |1.160 3369 27 nes 
45 2.053 9772 |4.020 5512 ae 9144 11.344 9359 eb: on 
om 7-789 8179 | 1.494 3732 | 2.839 3092 
2.053 9772 4.107 9545 8 8 
om aL Be it Pane ae 
-128 5057 | 1.6 ; 
1.625 7oIl 3-217 $335 
: : % 
| ea a ore 9), m = 98 = 
; hea! eee |S Oe eee Me oals}) m = 100 
as wig 9.81 Lee ae ome 
4.560 3 : 2. 2 
3. | 1-428 80% $0500 4.7535 48513 ers 
rf Slaeee 1.474 40 1.520 96> |1.56 Pg, see 
ras G80 3.464 840% | 3.612 2808 ae i Bete 
5 “112 40647 16.444 60247 |6 7 3-764 376® |3.921 225% 
: : -791 0864 | 7.152 3144 ° | 7.528 75207 
9-270 4830" 9. 88x 72378 | 1.052 61849 9 
‘ 1.191 9192 191.284 6241 19! 1.383 og 1.120 52939 |1.192 05249 
1.326 o102 1.445 2021 11 fs rad or 1.488 7032 10! 5 600 7561 10 
9 | 1.296 5433 12} 1.429 1443 12 ete ae i 1.712 0086 1/1860 8789 11 
fe) 1.127 9926 13} 1.257 6470 1a\ 5. 4g 27) 1 7B0 e909 12) 49a 2gx8 7? 
400 5614 Te§57 92 13 
9279 131.731 0309 18 


A a a ee ees er 


BINOMIAL COEFFICIENTS, C; 


452 Ill. THE 
n m = 96 m= 97 m = 98 m = 99 m= 100 
11 |8.818 8516 13|9.946 8442 13|1.120 44g1 14) 1.260 5053 '4|1.416 2980 _ 
12 |6.246-6865 14|7.128 5717 14/8.123 2561 |.9.243 7052 |1.050 4211} 
13 14.036 3205 15| 4.660 9892 15] 5.373 8464 15] 6.186 17201}7.110 5425 
14 12.392 9615 18) 2.796 5935 19] 3.262 6924 19] 3.800 0771 19) 4.418 6943 1 
15 |1.308 1523 17)1.547 4484 17/1 1.827 1078 1") 2.153 3770 ')2.533 3847 17 
16 [6.622 5208 |7.930 6731 __|9.478 1215 | 1.130 5229 18) 1.345 8606 18 
17 [3.116 4804 18) 3.778 7325 18) 4.571 7998 18| 5.519 6119 |6.650 1349 
18 11.367 7886 19| 1.679 4367 19|2.057 3099 19|.2.514 4899 19) 3.066 4511 
19 5.615 1322 |6.982 9208 |8.662 3575 |1.071 9667 20/7 323 4167 
20 «| 2.161 8259 29|.2.923 3391 2) 3.421 6312 29] 4.287 8670 -| 5.359 8337 
a1 |7.823 7509 9.985 5768 | 1.270 8916 7") 1.613 0547 742.041 8414 2! 
22 | 2.667 1878 213.449 5629211 4.448 1206 | 5.719 0122 |7.332 0669 
23 [8.581 3869 | 1.124 8575 221 1.469 8138 27| 1.914 6258 27) 2.486 5270 22 
24 2.610 1718 22] 3.468 3105 |4.593 1680 |6.062 9817 _|7.977 6076 
as |7.517 2949 | 1.012 7467 23/1.359 5777 73] 1.818 8945 79|2.425 1927 23 
26 | 2.052 7998 23) 2.804 5292 | 3.817 2759 _ | 5.176 8536 16.995 7482 
27 | 5.322 0734 | 7.374 8732 | 1.017 9402.24] 1.399 6678 241 1.917 3532 24 
28 | 1.311 5110 24/1.843 7183 24| 2.581 2056 | 3.599 1459: 14.998 8137 
29 13.075 2671 | 4.386 7780 |6.230 4963 |8.811 7oI19 |1.241 0848 25 
30 | 6.868 0965 + |9.943 3635 | 1-433 0142 251 2.056 0638 75) 2.937 2340 
31 | 1.462 2399 75) 2.149 0495 75) 3.143 3859 | 4.576 4000 | 6.632 4638 
32 [2.970 1748 14.432 4147 16.581 4642 | 9.724 8501 {1.430 1250 26 
33 | 5-760 3390 | 8.730 5137 | 1.316 2928 96 1.974 4393 79] 2.946 9243 
34 | 1.067 3569 29! 1.643 3908 26) 2.516 4422 | 3.832 7350 | 5.807 1743 
35 [1.890 7466 | 2.958 1035 | 4.601 4943 17.117 9365 |1.095 0672 27 
36 | 3.203 7650 | 5.094 5115 | 8.052 6150 | 1.265 4109 2"| 1.977 2046 
37 [5-195 2946 8.399 0596 | 1.349 3571 27| 2.154 6186 | 3.420 0295 
38 | 8.066 3784 | 1.326 1673 27| 2.166 0733 13.515 4304 | 5.670 0490 
39 11-199 615377) 2.006 2531 | 3.332 4204 | 5.498 4937 |9.013 9240 
40° [1.709 4517 |2.909 0670 | 4.915 3201 | 8.247 7405 | 1.374 6234 28 
4! 21334 8609 | 4.044 3126 | 6.953 3796 | 1.186 8700 782.011 6440 
42 13-057 5560 | 5.392 4169 | 9.436 7295 |1.639 O109 | 2.825 8809 
43 13-839 7214 | 6.897 2774 | 1.228 9694 78) 2.172 6424 | 3.811 6533 
44 [4.625 1190 |8.464 8404 |1.536 2118 ° | 2.765 1812 | 4.937 8236 
45 | 5-344 5820 | 9.969 7009 | 1.843 4541 | 3.379 6659 | 6.144 8471. 
46 |5.925 5148 | 1.127 0097 78) 2.123 9798 | 3.967 4339 17-347 0998 
47 6.303 7391 | 1.222 9254 12.349 9351 14.473 9148 | 8.441 3487 
48 [6.435 0670 | 1.273.8806 | 2.496 8060 14.846 7411 | 9.320 6559 
49 1.273 8806 | 2.547 7612 | 5.044 5672 | 9.891 3083 
50 5.044 5672 !1.008 9134 29 


+, 


~ 


IV. THE NORMAL ERROR FUNCTION 453 
y 4) y 4) y 5(y) 
00 1.0000 .40 0.6892 .80 0.4237 
Ol 9920 41 .6818 .81 4179 
.02 .9840 -42 .6745 82 4122 
03 -9761 42 .6672 .83 4065 
04 9681 -44 .6599 84 . 4009 
05 0.9601 45 0.6527 85 0.3953 
.06 9522 .46 6455 .86 - 3898 
.07 9442 -47 -6384 87 3843 
.08 9362 48 .6312 .88 3789 
-09 9283 -49 6241 .89 3735 
10 0.9203 ae) 0.6171 .go ©. 3681 
II 9124 .51 6101 .gI . 3628 
12 9045 xP .6031 92 3576 
14 . 8966 53 . 5961 £o3 3524 
-14 8887 54 5892 94 3472 
abet 0. 8808 255) 0.5823 95 0.3421 
16 8729 56 15755 .96 3371 
17 8650 a7) . 5687 .97 nQg2T 
.18 8572 .58 5619 .98 3271 
oy) -8493 39 2S “99 ae 
.20 0.8415 .60 0.5485 00 0.3173 
221 8337 61 -5419 .O1 3125 
PB .8259 62 5353 .02 3077 
28 8181 63 . 5287 03 - 3030 
24 .8103 64 5222 04 2984 
128 0. 8026 65 0.5157 05 0.2937 
.26 -7949 66 - 5093 .06 2891 
et 7872 67 . 5029 07 2846 
28 7795 .68 4965 08 2802 
-29 7718 .69 4902 .09 2757 
- 30 0.7642 70 0.4839 Me 0.2713 
31 7566 a7 4777 ih .2670 
oe -7490 IP 4715 12 2627 
-33 7414 -73 4654 -13 2585 
+34 +7339 74 4593 14 2543 
35 0.7263 75 0.4533 15 0.2502 
36 .7189 76 4473 .16 2460 
‘ape = Jilg cries +4413 aly) +2420 
-38 - 7039 78 +4354 .18 2380 
39 -6965 9 4295 .19 2341 
5 O _ 2 
The definition of the Error Function is ®(y) = Vf ¢2 dy 
wy 


454 


IV. THE NORMAL ERROR FUNCTION 


#(y) 


Q.2301 
2263 


2187 
2150 


0.2113 
.2077 
2041 
. 2006 
1971 


0.1936 
. 1902 
1868 


1835 
- 1803 


Q.1770 
-1738 
-1707 
.1676 
.1645 


O.1615 
-1585 
.1556 
1527 
-1499 


0.1471 
-1443 
-1416 
-1389 
1362 


0.1336 
1311 
-1285 
.1260 
1236 


O.1211 
.1188 
1164 
-II4I 
.1118 


pep Bye 


oy) 


- 


IV. THE NORMAL ERROR FUNCTION 


455 

y B(y) y ®(y) y B(y) 
40 0.0164 2.70 0.0069 3.00 ©.0027 
41 o160 ay .0067 .10 .OO19 
42 O1ss 72 0065 226 .0014 
43 OIsl 73 .0063 . 30 .OO10 
-44 0147 74 0061 .40 .©007 
45 0.0143 pleas 0.0060 50 ©.0005 
-46 0139 .76 0058 .60 .0003 
47 0135 77 0056 70 0002 
48 O131 78 0054 .80 .OOOI 
-49 0128 -79 0053 go Aorere) 
.50 0.0124 2.80 ©.0051 4.00 ©.0001 
51 O121 81 .0050 
52 OII7 .82 0048 
53 OIl4 83 0047 
54 oll! 84 0045 
55 0.0108 2.85 0.0044 
.56 O105 .86 0042 
57, O102 87 .0041 
58 0099 . 88 .0040 
59 0096 .89 0039 
.60 0.0093 2.90 0.0037 
.61 0091 gl 0036 
-62 0088 92 0035 
63 0085 93 0034 
64 0083 -94 0033 

1.645 1 
65 0.0081 2.95 0.0032 2.576 .O1 
66 .0078 96 0031 3.291 O01 
.67 0076 SOF .0030 3.890 .OOOI 
68 .0074 .98 0029 4.417 . C0001 
-69 .0O7I -99 0028 4.892 . O0000I1 


/456|  V. THE NORMAL LAW, ITS INTEGRAL, 
= 

y ¢_,() o(y) $'(y) ¢'"(y) 
0.0 +0<50000 +0. 39894 —0.00000 —o. 39894 
O.1 0.53983 ©. 39695 0.03970 ©. 39298 
One 0.57926 ©. 39104 0.07821 ©.37540 
0.3 0.61791 ©. 38139 O.11442 ©. 34706 
0.4 0.65542 0.36827 0.14731 ©. 30935 
0.5 +0.69146 +0. 35207 ©7003 —0.26405 
0.6 0.72575 ©. 33322 ©.19993 0.21326 
0.7 0.75804. ©. 31225 0.21858 0.15925 
0.8 0.78814 0.28969 ©.23175 0.10429 
°.9 0.81594 0.26609 0.23948 —0.05056 
1.0 +0. 84134. -++-0.24197 —0.24197 -+o .c0000 
Tae 0.86433 0.21785 0.23964 ©.04575 
2,3) 0.88493 ©.19419 ©. 23302 0.08544 
Ng 0.90320 0.17137 0.22278 0.11824 
1.4 0.91924 0.14973 0.20962 0.14374 
515 +0.93319 +0.12952 —0.19428 +0.16190 
1.6 ©.94520 ©. 11092 0.17747 0.17304 
c.7 0.95543 ©.09405 0.15988 0.17775 
1.8 0.96407 0.07895 0.14211 0.17685 
1.9 0.97128 0.06562 0.12467 0.17126 
2.0 +0.97725 +0 .05399 —0.10798 +0.16197 
ont 0.98214 0.04398 0.09237 ©.14998 
2.2 0.98610 0.03547 0.07804 0.13622 
2.3 0.98928 - 0.02833 0.06515 Onraing2: 
OA 0.99180 0.02239 0.05375 0.10660 
2.5 +0.99379 +0.01753 —0.04382 --0.09202 
2.6 0.99534 0.01358 ©.03532 0.07824 
By] 0.99653 ©.01042 0.02814 0.06555 
2.8 0.99744 ©.00792 0.02216 ©.05414. 
2.9 0.99813 0.00595 ©.01726 0.04411 
aC +0.99865 +0 .00443 —0.01330 +0.03545 
B25 0.99903 0.00327 0.01013 0.02813 
Bee ©.99931 © 00238 ©.00763 ©.02203 
Bod ©.99952 ©.00172 ©.00568 0.01704. 
3-4 ©.99966 0.00123 0.00419 0.01301 
a5 +0.99977 +0 .00087 —0.00305 +0.00982 . 
3.6 0.99984 0.00061 ©.00220 0.00732 
Reo 0.99989 0.00042 0.00157 , 0.00539 
3.8 ©.99993 0.00029 0.00111 ©.00392 
3-9 ©.99995 © .00020 ©.00077 0.00282 
4.0 +0.99997 0.00013 —0.00054 -+-o.00201 


The notation is: ¢(y) = 


v2 


I ee 


Vax 


2 


> 


AND ITS DERIVATIVES UP TO THE SIXTH 457 


—_—e—e—vxwxXes_ee 


y ¢'”"(y) o*(y) $°(y) ori(y) 
0.0 +o.00000 +1.19683 —0.00000 —+5.098 oO 
O.1 0.11869 1.16708 ©. 59146 ease 
0.2 0.23150 1.07990 1.14197 Seityp Go 
0.3 ©. 33295 ©.94130 1.61420 4.22226 
0.4 0.41835 0.76070 1.97770 3.01221 
0.5 +0. 48409 +0. §5010 Sel LAD —1.64481 
0.6 ©. 52783 ©. 32309 2.30517 OOP 7 
0.7 ©. 54863 -++0 .09371 2.26012 sites 354: 
0.8 0. 54694 —o.12468 2.08800 2.29382 
0.9 ©.52445 0.32034. 1.80951 3.23026 
1.0 +0 .48394 —0. 48394 —1.45182 +3.87153 
Eat ©. 42895 0.60909 1.04580 4.19585 
ge? 0.36352 0.69255 0.62301 4.21034 
nae 0.29184 0.73413 —0.21300 3.94753 
1.4 ©.21800 0.73642 +0.15897 3.45953 
1.5 +0.14571 —0. 70425 +0 .47355 +2. 81094 
1.6 0.07809. 0.64405 0.71813 2.07125 
EF +0.01759 0.56316 0.88702 1.30785 
1.8 —0.03411 0.46915 ©.980g0 +0. 58014 
1.9 0.07605 0.36928 1.00583 —0.06467 
20 —0.10798 —o.26996 +0.97184 —0. 59390 
phe! 0.13024 0.17646 0.89150 0.98987 
okay 0.14360 0.09274 0.77844 1.24885 
eed 0.14920 —0.02141 0.64604 1.37883 
24 0.14834 +0 .03623 0. 50642 1.39654 
2.5 —0.14242 +0.07997 +0. 36974 —1.32421 
2.6 0.13279 ©.11053 0.24376 1.18645 
Orae 0.12071 0.12926 0.13381 1.00761 
28 0.10727 ©.13793 Fo 04287 ©. 80970 
2.9 0.09339 0.13850 —0.02810 0.61102 
3.0 —0.07977 +0.13296 —0.07977 —0.42546 
BIE 0.06694 0.12313 0.11395 0.26242 
372 0.05523 0.11066 0.13319 0.12712 
3.3 0.04485 0.09690 0.14036 =0102130 
304 0.03586 0.08290 0.13840 +0.05607 
Borg —0.02825 +0.06943 —0.13000 +0.10784 
3.6 0.02194 0.05703 O.11755 0.13802 
Beh 0.01680 0.04599 ©.10297 0.15102 
aes ©.01269 0.03646 0.08777 0.15124 
3-9 0.00946 0.02842 0.07302 0.14264 
4.0 —0.00696 +0.02181 —0.05942 +0.12861 


1 _ Foy) 
g¢-1(y) = $(y) 4y, ¢(y) = a 
-% 


458 VI. THE POISSON FORMULA, *P’(,/) 

4 €=0.1 €=0.2 €=0.3 €=0.4 €=0.5 
° 9.04847) 8.187371 7.408271 6.7032} 6.06537} 
I oe 1.6375 : 2) 2225 2.6813 . 3:0337_, 
2 4.5242- 1.6375~ 3-33377 5.3626 7-5816- 
3 1. soa ~* 1.0916~3 3.3337 7.1501 —* 1.2636 

4 3-7702-8 | 5.4582- 2. 5003 7-150lae || ease 
5 2.1833 ~° 1.500275 5.7201 ~> 1.5795~* 
6 3.81347 .3163-° 


-ROPYPHO 


VI. THE POISSON FORMULA, *P’(j) 459 


i «=6 €=7 e= 8 €=9 €=10 
° 2.478873 g. 1188-4 335467 = 1.2j4r—* 45404 
I 1.487377 6. 3832-3 2.6837~3 1.110773 4.54007 
2 4.4618 2.2341 ~? 1.07357" 4.9981 2.2700 

3 8.9235 5.2129 2.8626 1.49947 7.5667 

4 1.3385-" | 9.1226 5.7252 3-3737 1.8917~ 
5 1.6062 1.9772! 9.1604 6.0727 3.7833 

6 1.6062 1.4900 1.291473 g. 1090 6.3055 

7 1.3768 1.4900 1.3959 r.1712-! 9.0079 

8 1.0326 1. 3038 1.3959 1.3176 1.12607! 
9 6.8838 -7 1.0140 1.2408 1.3176 pulsar 
Io 4.1303 7.0983~2 9.92622 1.1858 r 1.251% 
11 2.2529 4.5171 7.2190 9 .7020~ 1.1374 
12 1.1264 2.6350 4.8127 7.2765 9.4780~ 
13 5.199073 1.4188 2.9616 5.0376 7.2908 
14 2.2281 7.0942~3 1.6924 3.2384 5.2077 
15 8.9126-4 3-3106 g.0260~3 1.9431 3.4718 
16 3-3422 1.4484 4.5130 1.0930 2.1699 
17 1.1796 5.9640-* 2.1238 5.7863 - 1.2764 
18 3.93207° 2.3193. 9.4389 7% 2.8932 7.0911 ~3 
19 1.2417 8.5449— 3-9743 1.3704 3-7322 


460 VI. THE POISSON FORMULA, *P’(s) 
j e=6 €=7 e=15 €=9 e€= 10 

2 -6 48). eegheae a ete ane 1.8661 ~3 
20 3.7251 2.9907 1.5897 . 1670 
21 1.0643 9.9690~8 6.0561 -> 2.6430 8.8861 —4 
a8} 3.1720 2.2022 1.0812 4.0391 
23 7.6598—6 4.2309~° 1.7561 : 
24 2.5533 1. 5866 7-31737 
as 5.7117 ° | 92.9269 
26 1.9771 1.1257 
27 4.1694 
28 1.4891 
ij €=II e€= 12 €= 13 e= 14 e=15 
fo) 1.6702~5 6.14428 2.2603 ~ 
I 1.837274 5.4931 -° 2-9384—° E1641? 4.58858 
2 T.oloccs 4.4238 1.g100~* 8.1490 3.4414 
3 3.7050 1.7695 - 8.2766 3.8029 —* 1.7207~4 
4 1.01897 5.3086 2.6899 ~3 1.33107° 6.4526 
s 2.2416 hoo 7ad 2 6.9937 3.7268 1.93582 
6 4.1095 2.5481 Tage 8.6959 4.8395 
7 6.4577 4.3682 2.8141 1.7392 1.0370- 
8 ote 6.5523 4.5730 3.0436 1.9444 
9 1.08537 8.7364 6.6054 4-7344 3+2407 
10 1.1938 1.048471 8.5870 6.6282 4.8611 
II 1.1938 E437 1.014871 8.4359 6.6287 
12 1.0943 1.1437 1.0994 9.8418 8.2859 
13 9.2595 ~ 1.0557 | 1.0994 1.05997! | 9.5607 
14 7.2753 9 -0489- 1.0209 1.0599 1.92447 
15 5.3352 7.2391 8.8475-? | 9.8923-% | 1.0244 ; 
16 3.6680 5.4293 7.1886 8.6558 9.6034 - 
17 2.3734 3.8325 5.4972 7-1283 8.4736 
18 1.4504 : 2.5550 3-9702 5.5442 7.0613 
19 8.3971 ~ 1.6137 2.7164 4.0852 5.5747 
20 4.6184 9 .6820-3 1.7657 2.8597 4.1810 
21 2.4192 5.5326 I .0930 1.9064 2.9865 
ap) 1.2096 3.0178 6.4589- E2132 2.0362 
23 S-y8ag-" | 1.5745 | | 3.6507 7-3846-3 | 1.3280 
24 2.6514 87255 1.9775 4.3077 8.2998-8 


gt >) a ee 


VI. 


THE POISSON FORMULA, *P’(j) 461 
PNG) ere GSMS 
PrO203p |e 2c4tag—* | 4197998 

eatase 1.2989 2.8730 
2.4756 6.73524 1.5961 
1.1493 3.3676 8.5506~4 
B.trae-5 1.6267 4.4227 
2.2326 7.5868-5 2.2114 
g. 362576 3.4263 1.0700 
3.8035 1.4990 5.015775 
1.4984 6.35947 2.2799 
2.6186 1.0058 
1.0474 4.3107 © 
1.7961 
e= 18 e=19 €= 20 
2.467376 t.0119~° 
1.480475 6.4049 2.7482~® 
6.6616 3.0423 - Dyan o 
2.3982-4 es 5.4964 
7.1945 3.6610 1.83214 
1.8500-% | 9.9369 5.2347 
4.1625 2.36007 1.30873 
8.3251 4.9822 2.9082 
b.g9857" 9.4662 5.8163 
2.4521 1.6351 -2 1.09752 
3.6782 2.5889 1.7625 
5.0929 3.7837 2.7116 
6.5480 5.1351 3.8737 
7.8576 6.5044 5.1649 
8.8397 7.7240 6.4561 
9-3597 8.6327 7-5954 
9.3597 Ouitas 8.4394 
8.8671 9.1123 8.8835 


462 VI. THE POISSON FORMULA, ”P’(7) 

i] e€= 16 €=17 e= 18 €=I9 € = 20 
20 5.592072 6.915977 7.98047 8.6567 —7 8.8835 —7 
21 4.2605 5.5986 6.8403 7.8323 8.4605 
22 3.0986 4.3262 5.5966 6.7642 7.6914 

23 2.1555 3.1976 4.3800 5.5878 6.6881 
24 1.4370 2.2650 3.2850 4.4237 5.5735 
25 g. 1969-8 1.5402 2.3652 3.3620 4.4588 
26 5.6596 1.0070 1.6374 2.4569 3.4298 

27 Bee 59 6.3406 -8 1.0916 1.7289 2.5406 
28 1.9165 3.8497 7.0176-9 1.1732 1.8147 
29 1.0574 2.2567 4.3558 7.6864-3 1.251% 
30 5 6393-4 1.2788 2.6135 4.8680 8.343572 
31 2.9106 7.0128 1.5175 2.9836 5.3829 
32 1.4553 3-7255 8.53597 1.7715 3- 3643 
33 7.0561~9 1.9192 4.6559 1.0200 2.0390 
34 3-3205 9.5961-° | 2.4649 5.6998-* | 1.1994 
35 1.5179 4.6609 1.2677 3.0942 6.8537-* 
36 6.7464-8 2.2010 6.33837° 1.6330 3.8076 
37 2.9174 1.0113 3.0835 8.3859~° 2.0582 
38 1.2284 4.5241 1.4606 4.1930 I (0833 
39 1.9720 6i74tg-© 4} 230497 5.5551? 
40 310336 9-7030-8 | 2.7776 

41 1.3318 4.4965 1.3549 
42 2.0341 6.4520-8 
43 3.0009 
44 1.3641 


VI. 


THE POISSON FORMULA, “I 


fe) I .0000 2 I .0CO0O0 I.0000 ince 4 i ae 

I 9.5163 7° 1.81277) 2.5918 —! 3.296871 3-9348- | 
2 ok ae ae 3.6936 ~2 6.1552-2 g .0204— 
3 1.54657 | 1.14857 3-5995~ 7.926378 | 1.4388 

4 3.8468 5.68407 2.6581 ~4 7.7625~4 1.751@-3 
5 2.25826 1.578575 6.124375 1.72124 
6 4.0427~8 1.416g-5 
hs 1.00247 
v €=0.6 €=0.7 e=0.8 €=0.9 

° I.0000 1.0000 1.Q000 1.0000 

I 4.511971 5.03411 5.5067 -7 5.9343 ~ 

2 1.2190 2 1.5580 T.g121 2.2752 é 

3 aR oo oe 6.2857 

4 3.3581 5-7535~ 9-0799~ 1.3459 

5 3.9449" 7.85547 V.4IT3 | 2.34412 

6 3.8856— co bee 24097, 
Gi; 3.293178 8.8836- 2.07477 4.3401 — 
ae 2.05028 4.82726 


I .0000 
g. 8168-1 
9.0842 
7.6190. 
5.6653 


Gis 


I .0O0O. 
9.932671 
9.5957 
8.7535 
7.3497 


I .0000 
9.9909~1 
9.9270 
9.7036 
9.1823 


8.2701 
6.9929 


1.0000 
9.9966 -! 
9.9698 
9.8625 
9.5762 


90037 
8.0876 


6.8663 


5.4704 
4.0745 


2.8338 
1.8411 
I. 1192 
6.3797-? 
3.4181 


1.7257 
8.231073 
3.7180 
1.5943 
6.5037-4 


NYwm wow 
aN 
nr 
w 
ar 


Hine OAD 
Ne) 
oo 
lony 
Ne} 


VII. THE POISSON FORMULA, °II; 


v e=6 €=7 e=8 €=9 
20 5.18026 @24g02~° 2.529473 1.056078 
a1 1.4551 1.4495 9.3968" | 4.39254 
22 4552637 > (43.3407 1.7495 
23 1.3543 1.1385 6 6828-5 
24 B7esS =O « B245T9 
26 1.1722 8.6531 ~© 
26 2.9414 
27 

28 


° Te 
I Oy. 
2 9. 
3 ee 
4 uP 
5 9: 
6 9. 
T wh 
8 8. 
9 7: 
10 6. 
II ie 
12 4. 
3 gis 
14 2. 
15 it 
16 9. 
17 is 
18 Cee 
19 ike 
20 9. 
21 4. 
22 De 
23 iB 
24, 4 


0000 I .0000 

9998 —! 9.99997! 1.0000 1.0000 | 
9980 9.9992 9.9997-' | 9.99997 
9879 9.9948 9.9978 9 9991 
9508 9 9771 9-9895 9-9953 
8490 9.9240 9.9626 9.9819 
6248 9.7966 g 8927 9-9447 
2139 9.5418 9.7411 9.8577 
5681 g.1050 9-4597 9.6838 
6801 8.4497 9.0024 9-3794 
5949 7.5761 8.3419 8.go60 
4011 6.5277 7.4832 8.2432 
2073 5.3840 6.4684 7.3996 
1130 4.2403 5.3690 6.4154 
1871 _ 3.1846 4.2696 5.3555 
4596 2.2798 3.2487 4.2956 
2604-7 1.5558 2.3639 3.3064 
5924 1.0129 1.6451 2.4408 
2191 6 .2966~? 1.0954 1.7280 
7687 3.7416 6.983377 1.1736 
2895 -% 2.1280 4.2669 7.6505 ~2 
6711 1.1598 2.5012 4.7908 
2519 6.0651-% 1.4081 2.8844 
0423 3.0474 7.6225-% 1.6712, 
6386-4 | 1.4729 3.9718 9.3276 - 


466 VII. THE POISSON FORMULA, “II, 


v e—1t eo—s02 e— hs (Psst co— hs 
25 1.9871 —* 6.8563-4 1.994372 5.0199~3 Tti6g=" 
26 8.205075 3.0776 9.6603 ~4 2.6076 6.1849~% 
27 3.2693 1.3335 4.5190 A eT 3.3119 
28 1.2584 5.5836— 2.0435 Se skeen 1.7158 
29 4.68476 2.2616 8.9416 ~5 2.9837 8.6072~4 
30 1.6882 8.8701 ~§ 3.7894 1.3580 4.1845 
31 3.3716 1.5568 5.9928 1.9731 
32 1.2432 622052 = 2.5665 9.0312 
33 2.4017 1.0675 p 4.0155 
34 4.3154— 1.7356 
35 1.6968 7.2978 -8 
36 2.9871 
37 1.1910 


v «= 16 €=17 e€= 18 €= 19 € = 20 
(e) 

I 

2 I .0000 I .0000 

3 9.9998-1 9.9999 1.0000 1.0000 

4 9.9991 9.9996 9.9998 - 9.999971 1.0000 
5 9.9960 9.9982 9.9992 9.9996 9.99981 
6 9.9862 9-9933 9.9968 9.9985 9.9993 
7 9.9599 9-9794 9.9896 9.9948 9.9974 
8 9.9000 9.9457 9-9711 9.9849 9-9922 
9 9.7801 9.8740 9.9294 9.9613 9.9791 
Io 9.5670 9.7388 9.8462 9.9114 9.9500 
It 9.2260 9. 5088 9.6963 9.8168 9.8919 
12 8.7301 9.1533 9.4511 9.6533 9.7861 
ig 8.0688 8.6498 9.0833 9-3944 9.6099 
14 7-2545 7:9913 8.5740 9.0160 9.3387. 
15 6.3247 7.1917 7.9192 8.5025 8.9514 
16 5.3326 6.2855 7.1335 our 8.4349 
17 4.3404 5 «3226 6.2495 7:°797 7: 7893.7 
18 3.4066 4.3598 5.3135 6.2164 7.0297 
19 2.5765 34504 4.3776 5.3052 6.1858 


VII. THE POISSON FORMULA, “II; 467 

v e€= 16 €=17 = 18 e= 19 € = 20 
20 Wiggs | earages 7 4)! 3.4908" | 4393971} 52297477 
emi 1.3183 1.9452 2.6928 3.5283 4.4091 
22 8.9227 -2 1.3853 2.0088 2.7450 3.5630 
23 5.8247 g: 272? 1.4491 2.0687 2.7939 
24 3.6686 6.3296 I.o1rl 1.5098 2.1251 
25 2.2316 4.0646 6.8260-2 1.0675 1.5677 
26 1.3119 2.5245 4.4608 7.312672 1.1218 3 
27 7-4589-° | 1.5174 2.8234 4.8557 7-7887- 
28 4.1051 8.8335 — 1.7318 3.1268 5.2481 
29 2.1886 4.9838 1.0300 1.9536 3-4334 
30 1312 2.7272 6.9443-° 1.1850 2.1818 
Ag 5.6726-* 1.4484 3.3308 6.9819~% 1.3475 
Qa 2.7620 7.4708 1.8133 3-9982 8.0918 
33 1. 3067 3-7453 9.5975 * | 2.2267 4.7274 
34 6.0108~5 1.8260 4.9416 1.2067 2.6884 
35 2.6903 8.6644~° 2.4767 6.3674 —* 1.4890 
36 Dek 724 4.0035 1.2090 Be Fee 8 .0366— 
37 4.9772~° 1.8025 Pasty? 1.6401 4.2290 
38 2.0599 7.91237 2.6684 8.01547 2.1708 
39 3.3882 1.2078 3.8224 1.0875 
40 1.4162 5-3365-© |) 1.7797, | §.32027° 
4I 2. 3030 8.09407 2.5426 
ad 3.5975 1.1877 
43 1.5634 erage) 
44 2.4243 
45 1.0603 


468 VIII. PEARSON'S CRITERION 


P= .98 P= 298 |) Pa= BOON) Pie Sols | eit 
©.000628 0.00393 0.0158 0.0642 0.148 
0.0404 0.103 0.211 0.446 0.713 
0.185 0.352 0.584 1.005 1.42 

0.429 o.71I 1.064 1.649 2.195 
Ons 1.145 1.610 2348 3.000 
1.134 1.635 2.204. 3.070 3.828 
1.564 2.167 2838 3.822 4.671 
23032 ZOIRE: 3-499 4-594 5 Sey. 
214532 ages 4.168 5.380 6.393 
3-059 3-940 4.865 6.179 73267; 
3609 4.575 5.578 6.989 8.148 
4.178 5.226 6.304 7.807 9-034 
4.765 5.892 7.042 . 8.634 9.926 
5.368 6.571 7.790 9.467 10.821 
5.985 7.261 8.547 10.307 11.721 
6.614 7.962 Ougts Lr 152 912,604) 
Fa25 5 8.672 TO.,.085 12.002 13-531 
7.906 9.390 10.865 12.867 14.440 
8.567 Welnity) 11.651 13.716 LS Le) 
9.237 10.851 12.443 14.578 16.266 
9-915 II.§91 13.240 15.445 17.182 
10.600 12.338 14.041 16.314 18.101 
11.293 13.091 14.848 17.187 Ig .021 
11.992 13.848 15.659 18.062 19.943 
12.697 14.611 16.473 18.940 20.867 
13.409 15.379 7202: 19.820 21.792 
14.125 16.151 18.114 20.703 22.719 
14.847 16.928 18.939 21.588 23.647 
15.574 17.708 19.768 22.475 24.577 
16.306 18.493 20.599 23.364 25.508 

Taken from Statistical Methods for Research Workers, by R. A. Fisher. 


by Oliver & Boyd, Edinburgh. 


Published 


OF GOODNESS OF FIT, P(>x?) 469 


$ Pee eet — 304 2 = 20°) P= .10| P = .o5 | P= 02 | P= cor 


I 0.455 1.074 1.642 2.706 3.841 5.412 6.635 
D} 1. 386 2.408 E219 4.605 5.991 7.824 9.210 
Ri} 2.366 3-665 4.642 6.251 7. Sts Oncgy es eluant 
4 3-357 4.878 5.989 7-779 | 9-488 | 11.668 | 13.277 
5 4-351 6.064 7.289 9-236 | 11.070 | 13.388 | 15.086 
6 5.348 yfeekeu 8.558 TO.645 | 12.592 | 16.033 | 16.812 
7 6.346 8.383 9-803 L2GOI7 P1407 |, 16.1622 |} 18.475 
8 7.344 9.524 II .030 13.362 | 15.507 | 18.168 | 20.090 
9 8.343 10.656 | 12.242 14.684 | 16.919 | 19.679 | 21.666 
10 9-342 11.781 53442) 152987 | 181407 |. 215261 || 23200 


II 10.341 12.899 14.631 T7e2p ye) LG.075) I 222008: | 24.709 
12 II.340 14.011 15.812 D3 5Agme2T.026) 24. Oc4.) | 260217 
13 12.340 1§.119 16.985 TOV ONO 22202, He Oca aon Nl O68 
14 13.339 16.222 18.151 21.064 | 23.685 | 26.873 | 29.141 
15 TARA Mt 9 27322) 19.311 22807 24900) s|929.259 51) (ROG 78 


16 1§.338 18.418 20.465 23.542 | 26.296 | 29.633 | 32.000 
107) 16. 338 19.511 Tie UG 24.769 | 27.587 | 30.995 | 33.409 
18 17.338 20.601 22.760 25.989 | 28.869 | 32.346 | 34.805 
19 18 . 338 21.689 23.900 27204) ke30. laa) 3340c7 | 36.090 
20 19.337 e775 25.038 Deu | heer wen || tiynrepleye I Metra elsy 


21 201337. 23.858 26.070 AGuOUsE le 92vO7l mao ana nl) Berg ge 
oh) S337 24.939 27.301 30.883) ||| 335924, 1) 372659 1140-289 
23 92-337 26.018 28.429 32.007 | 35.172 | 38.968 | 41.638 
24 2312347 27.096 29.553 83. 196 we BOeats | 40.270 |) 424980 
25 24.337 28.172 30.675 - 1 34.382 | 37-652 | 41.566 | 44.314 


26 25.336 29.246 31.795 | 35.563 | 38.885 | 42.856 | 45.642 
5) 26 . 336 30.319 32.912 36.741 | 40.113 | 44.140 | 46.963 
28 | 27.336 | 31-391 | 34.027 | 37-916 | 41.337 | 45.419 | 48.278 
29 | 28.336 | 32.461 | 35.139 | 39-087 | 42.557 | 46.693 | 49.588 
30 29.336 33-530 36.250 40.256 | 43.773 | 47-962 | 50.892 


For larger values of s’ use Appendix V, with y = AY Gt ee gee V 2x2 and 
P =$-1(y). , 


470 


CHOICE OF DISTRIBUTION CURVES 


IX. STANDARD DEVIATIONS OF IMPORTANT STATISTICS * 


(N is the number of observations from which the statistic is computed.) 


Standard Deviation of Statistic 


Statistic 
| 
eS Sym-| Sym- General 
Name bol bol Formula 
nas o 
Average nm a(n) Va 
Standard iF = D2? 
Deviation o a(c) Va 
2 
Asymmetry ah 
(Skewness)| V/Bi | «(-VB1) 
Flatness 
(Kurtosis)| Be a (Be) 


Special formula for 


Poisson 
aw 


ce 
2eFi 
Vz I 


/? 


Binomial Law 


mp(1—?) 
Vv N 


(m—1)p(1—b) + (26 —1) 


4N 


* Taken from Statistical Methods of Research Workers by R. A. Fisher. 


Oliver & Boyd, Edinburgh, 


Published by 


X. CRITERIA FOR CHOICE OF DISTRIBUTION CURVES * 


LL 


J <6 Af 9) J > 
IV Normal II 
I 
EY. Ill Binomial ~ 
Poisson 


ee eee 


*The Gram-Charlier Series can represent any of these types. 


—eeeeSS 
| 


Symbol Normal Law Poisson Law 
Algebraic Definition i —(n—a)2/202 as 
21 o n}\ 
Ist Expectation of «:(7) a € 
and Expectation of 6 €2 aa : 
3rd Exvectation of 6 €3 ° € 
4th Expectation of 6 €4 304 e+ 3e mp (I 
Standard Deviation o o Ve 
Asymmetry (Skewness) Vv Bt ° a | 
Ve ; 
Flatness (Kurtosis) Bo 3 aol : 
381 — 2B2-+ 6 Ji ° >So 
a=n e=n 
c=0c 


Equations for Determining 


Constants 


<P oe a te ee 


INDEX 


Addition Theorem, 5, 12 
Asymmetry 

definition of, 298 
Average 

definition of, 177, 183 
Axioms, Fundamental, 4 


Bad Penny, The, 125-127 
Bayes’ Theorem, 177-132, 265-266 
statement of, 121 
substitute for, 268-270 
uses of, 127-131 
Bernoulli's Theorem, 82-116 
proof of, 102-103, 108-111 
statement of, 95, 100 
Binomial Coefficients 
table of, 439-452 
(see also Combinations) 
Binomial Law 
application to traffic problems, 
334-336, 345,347 
as empirical distribution function, 
304-305 
Gram-Charlier approximations for, 
206-213, 255-261 
relation to Normal Law, 208-211 
relation to Pearson’s Curves, 245-246 
relation to Poisson Law, 214-216 
relation to problem of independent 
trials, 63, gI—-101 
standard deviation of statistics of, 
314, 470 
statistics of, 471 


Binomial Theorem, 29-31 


Certainty, 3, 4, 87-88 
Change of Variable, 150-163, 261-263 


Channel. 
definition of, 323 
Cogent Reason 
doctrine of, 6, 117-I1a 
Collision of Molecules 
change of velocity oz, i71-173, 
399-393» 395-398 
Combinations, 12-38 
definition of, 15 
fundamental formulae, 26, 28, 31, 60, 69 
tables of, 439-452 
Complete Group, 8 
Composition of Events 
laws of, 12-14 
Congestion 
problems of, 321-388 
Continuous Variables, 133-176, 183-186 
Control Charts, 315-317 
Conventions, Fundamental, 4 
Curve Fitting, 265-315 


Delays, 372-388 
at cooperative channels, 378-387 
at non-cooperative channels, 376-378 
effect upon traffic congestion, 374-376, 
380-382 
Deviation, 188-191 
Distribution Curve, 96-97 
(see also Distribution Functions) 


Distribution Functions, 96-97 
criteria for choice of, 296-302, 470, 471 
derived empirically, 144-146, 297-310 
empirical, 241-255 
for continuous variables, 141-146 
many variables, 147-150 
most frequently used, 205-264 
transformation of, 150-163, 261-263 


473 


474 


Distribution Functions, (continued) 
type criterion for, 300 
(see also Binomial Law, Engset Law, 
Erlang Law, Exponential Law, 
Gram-Charlier Series, Normal Law, 
Pearson’s Curves, and Poisson Law) 


Divergence 
definition, of, 290 


Engset Law 
application to traffic problems, 
349-3425 345, 347, 351-354 
computation charts for, 352-353 
Equally Likely, 5, 56 
Erlang Law 
application to traffic problems, 
3427-343) 345, 347 
Error Function 
(see Normal Law) 
Exponential Law, 378-387 
Expectation 
definition of, 177-186 
mathematical, 179 
motal, 179, 195-196 
of a probability, 199-202, 401-403 


Factorials, 20-25, 103-107 
tables of, 427-438 

Flatness 
definition of, 299 

Fluctuation Phenomena, 389-423 


Gamma Function 
(see Factorials) 


Gas Molecules 
change of velocity on collision, 171-173, 
399-393, 395-398 
distribution of speed, 167 
distribution of velocity, 163-171 
in spherical coordinates, 166 


Goodness of Fit, 268-297 
relation to Bayes’ Theorem, 268-270 
the x? criterion, 280-291 
table of, 468-469 


INDEX 


Gram-Charlier Series, 251-261 
as an empirical distribution function, 
307 310 
for Binomial Law, 206-213, 255-261 
for Poisson Law, 237-240, 261 
table of Normal Law and its deriva- 


tives, 456-457 


H-Function, The, 400-403 
Hermite Polynomials, 252-255 
Hunting Problems, 356-369 


Impossibility, 4, 138-140 
Independent Trials, 62-65, 82-116 


Insufficient Reason 
doctrine of, 6, 117-119 


Jacobian, 153-163 
general significance of, 173-174 


Kinetic Theory of Gases, 163-171, 390-417 
density fluctuations, 410-414 
diffusion, 414-417 
fundamental equation of, 399-400 
H-Function, the, 400-403 
Maxwell’s Law of velocity, 163-165, 
404-406 
mean free path, 409-410 
number of collisions, 409 
pressure, 406-408 
Kurtosis 
~ (see Flatness) 


Limit, 83-86 

definition of, 85 

probability as a, 88-91 
Line Segment 

choice of point on, 133-141 


Maxwell’s Equation, 163-171 
change of variable in, 165-171 
derivation of, 163-165, 390-406 

Median 
definition of, 187-188 

Moment 
definition of, 183 


INDEX 


Muttiplication Theorem, 12, 48-52, 
113-116 
Mutually Exclusive Events, 7 


Normal Law 
in one variable, 169 
in several variables, 165, 280-285 
logical standing of, 205, 241-244 
relation to Binomial Law, 208-211 
relation to Pearson’s Curves, 246 
relation to Poisson Law, 238 
standard deviation of statistics of, 

314, 470 

statistics of, 471 
table of, and its derivatives, 456-457 
table of Error Function, 453-455 


Paradoxes 
Bertrand’s “Box,” 121-122, 131-132 
“Life on Mars,” 117-119 
“of the impossible,” 138-140 
“St. Petersburg,” 194-199 
\, Pascal’s Triangle, 27-29 
Pearson’s Curves, 244-251 
Permutations, 12-38 
definition of, 15 
fundamental formulae, 25, 34, 36 
Poisson Law 
application to traffic problems, 
227-229, 233-237, 336-338, 
345, 347s 354-355 
application te variable traffic density, 


2335237 

application to ¥. arehouse problem, 
227-232 i 

as empirical distribution function, 
sobre 


Gram-Charlier approximation to, 
237-240, 261 

relation to Binomial Law, 214-216 

relation to Normal Law, 238 

relation to random distributions, 
220-227 

standard deviation of statistics of, 
314, 47° 

statistics of, 471 

tables of, 458-467 


475 


Population 
definition of, 267 
Probability 
alternative compound, 53-81, 149 
as a Statistical ratio, 9, 82-113 
complementary, 39 
compound, 48-52, 113-116, 149 
conditional, 43-48, 148-149 
definition of, 1-11 
determined by experiment, 112, 119, 
125-132 
elementary principles of, 39-81 
fundamental theorems, 8, 48, 54, 63, 
65, 68 
irrational values of, 135-137 
measure of, 7 
unconditional, 39 
unit of measure for, 3, 4 
Psychic Research 
examples from, 41-42, 44, 45, 46, 51, 


57-60 


Random 
“collectively at random,” definition of, 
218 
“individually atrandom,” definition of, 
218 
non-random distribution, examples of, 
143-146 
Poisson Law appropriate to random 
distribution, 220-227 
“random distribution,’ definitionof, 141 
Repeated Trials, 62-70, g1-101 


Root Mean Square 
definition of, 183 


Schottky Effect, 417-423 
Schroteffekt . 5 
(see Schottky Effect) 
Sets of Numbers, 84-88 
bounded set, 86-88 
closed set, 86 
open set, 86 
Sheppard’s Corrections, 310-312 
Skewness 
(see Asymmetry) 


. 
Teepe ele et bees 
ioe eased 
$19: $599), 97? 


| ‘ + ~ 
at dyatatse eae 


tiie . 
RuRS ey : bes otal ses tete 
weary wee Pysie ea vies s 
sloie's . rvs 


At * fete) 
Sas Piet peasia- ace ore O88 atte 
CoC ae) 


eee yet 
pSie1 8914192019 4 6 41e F SES 019 S 
SPE ee ee Pe ee Ete ese tee 
‘ 


Other hits ety, oo. 


Mii e ee 
949 dit Rect leye 
SRAM Aes 


P29 we 
Le Jae A ia) 

b Lh be she 

WS ebay , Se eee eS 
Stree $ $ H 


Satan st Soe ee tre rf 
i . Tetend 
eta tee . rosstete @ ete 
97 8ieparers ite Wrstere a Citeteta 
+i * ot 


> Le ty pet peers sa eee’ 
ele sa dis) Werere yea &39 by 
© > oS we 6) 04 8, 
, . *. > uf 


att ate 


OOo sare 
RVR A A) 


ae Sear tei 
Pare epee 


hoe 
See eees 
Soo sire te ty 
Tete ae 
ties 


tte 
SEU ierareis ats ete 


Lee ee 


HOS hee ye 
raretoya s?e reteset tara se? 
a 


‘ 
Pah) 5 
LAR VOC 32 0 a0 of e900 
hts + 


; 

Ose ber ers aha asi . 
Min sceisieereiste.s prerererbinces 
site ities: : strates tat aS 

ea uti sees 

be ered rste Hater sae se 

: 4 Sate eleierris tity 

ee ' ; +i x heir! ‘ eek tah 

: ; 

sehr Ue a é > 

Five ts 

aoe 


eene 
sr etirarstereis ee 


ae rat 
sit hen 
eee se 80K 
Cerereite 
Pig baht Oe OF 91 Mt 8 


yebrarels 

Pye Fo 
oF? Bea se 288s 
aa Bt be hd aha 
TE SN Shhh 


Vitis Oris ieterscd ashi i 
‘ ois Bie Pees bese 
' J pear at > 
. Fi 
$ Sleiy 
siPay 


3 ier 
a eT 


i } bade Sie iabelensie.s cat) 
Pe se ctie sete k AAA AM AAS 
file bre it terecicens ' OO 


se wales es 

+e 6 
prslis ee weeee i oe eet eihie 
darete stots 4 Tee sewers ees 


© 
: : USSD brovace' a. 
Pe Brerevete : Tri teenwas 
Pees 2 . i 2 s 

SARELP ss 2 t) ‘ vee . 


ee teee ee 
yaa 
whore eI 


Petes ere 


aes 
theese nes 
Pere rary 


vet es 
bee ee seas 
#0 0 Ortey sb ba ae 
(a 

oe 


pitt tte ate boa sterace FS eT Be o 
Ca eet P RISER EOS S10 88 Lat 
O09 2 Ob Soe bs boosie sole ¥ 

+e e eee ee eee 
Sete te ea gee eR tae 


veh es 


> eee eae 
dusttres 
> bbs eo 


